[Pacemaker] crm resource restart doesn't restart the correct resource

Vadym Chepkov vchepkov at gmail.com
Thu Nov 25 12:09:28 UTC 2010


On Nov 25, 2010, at 7:01 AM, Pavlos Parissis wrote:

> On 25 November 2010 12:44, Vadym Chepkov <vchepkov at gmail.com> wrote:
>> 
>> On Nov 25, 2010, at 6:31 AM, Pavlos Parissis wrote:
>> 
>>> Hi,
>>> When issue crm resource restart pbx_01 PE restarts the wrong resource.
>>> The pbx_01 belongs to a resource group and the last resource of that
>>> group is restarted.
>> 
>> This is why cluster has groups. groups define collocation/ordering, so if you stop a resource everything depending on it has to be stopped, and group describes this dependency.
> If that was the case then sshd_01 should have been restarted it as well.

Well it tried, but failed, I see it in the log


> 
>> 
>> I would say you have to:
>> 
>> crm resource unmanage pbx_01
>> /etc/init.d/znd-pbx_01 restart
>> crm resource meta pbx_01 delete is-managed
> 
> I am not real convinced that I have to go down this path.
> 
>> 
>> Vadym
>> 
>> 
>> 
>>> The pbx_01 is a lsb:init script resource type and the init script
>>> supports restart as we can see below
>>> 
>>> [root at pbxsrv1 ~]# /etc/init.d/znd-pbx_01
>>> Usage: /etc/init.d/znd-pbx_01 {start|stop|restart|force-reload|status}
>>> [root at pbxsrv1 ~]# grep -A 5 restart /etc/init.d/znd-pbx_01
>>>        restart)
>>>                stop
>>>                start
>>>                ;;
>>>        force-reload)
>>>                stop
>>> --
>>>                echo $"Usage: $0 {start|stop|restart|force-reload|status}"
>>>                exit 2
>>> esac
>>> exit $RETVAL
>>> 
>>> This happens on 1.0.9, 1.0.10 and 1.1.3.
>>> 
>>> Here is the log on the node which has the resource group running
>>> 
>>> 12:04:43 pbxsrv1 lrmd: [8710]: info: cancel_op: operation monitor[28]
>>> on ocf::MailTo::mailAlert_01 for client 8713, its parameters:
>>> CRM_meta_interval=[2000] CRM_meta_timeout=[10000] email=[root]
>>> crm_feature_set=[3.0.1] subject=[[Zanadoo Clustet event]
>>> pbx_service_01] CRM_meta_name=[monitor]  cancelled
>>> 12:04:43 pbxsrv1 lrmd: [8710]: info: rsc:mailAlert_01:29: stop
>>> 12:04:44 pbxsrv1 lrmd: [8710]: info: rsc:mailAlert_01:30: start
>>> 12:04:47 pbxsrv1 lrmd: [8710]: info: rsc:mailAlert_01:31: monitor
>>> 
>>> 
>>> and this is the log of the DC node
>>> 
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: - <cib
>>> admin_epoch="0" epoch="78" num_updates="85" >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: abort_transition_graph:
>>> need_abort:59 - Triggered transition abort (complete=1) : Non-status
>>> change
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <configuration >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: need_abort: Aborting on change to
>>> admin_epoch
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <resources >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_state_transition: State
>>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <group id="pbx_service_01" >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_state_transition: All 3
>>> cluster nodes are eligible to run resources.
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>   <primitive id="pbx_01" >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_pe_invoke: Query 55:
>>> Requesting the current CIB: S_POLICY_ENGINE
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>     <meta_attributes id="pbx_01-meta_attributes" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>       <nvpair value="Started" id="pbx_01-meta_attributes-target-role"
>>> />
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>     </meta_attributes>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>   </primitive>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -       </group>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> </resources>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> </configuration>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: - </cib>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: + <cib
>>> admin_epoch="0" epoch="79" num_updates="1" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <configuration >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <resources >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <group id="pbx_service_01" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>   <primitive id="pbx_01" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>     <meta_attributes id="pbx_01-meta_attributes" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>       <nvpair value="Stopped" id="pbx_01-meta_attributes-target-role"
>>> />
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>     </meta_attributes>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>   </primitive>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +       </group>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> </resources>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> </configuration>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: + </cib>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: cib_process_request: Operation
>>> complete: op cib_replace for section resources
>>> (origin=local/cibadmin/2, version=0.79.1): ok (rc=0)
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_pe_invoke_callback: Invoking
>>> the PE: query=55, ref=pe_calc-dc-1290683083-98, seq=3, quorate=1
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: unpack_config: Node scores:
>>> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv3 is online
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv2 is online

lsb script not compliant?

>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: unpack_rsc_op: Hard error -
>>> sshd_01_monitor_0 failed with rc=5: Preventing sshd_01 from
>>> re-starting on pbxsrv2



>>> 12:04:43 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv1 is online
>>> 12:04:43 pbxsrv3 cib: [7628]: info: write_cib_contents: Archived
>>> previous version as /var/lib/heartbeat/crm/cib-28.raw
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: unpack_rsc_op: Hard error -
>>> sshd_02_monitor_0 failed with rc=5: Preventing sshd_02 from
>>> re-starting on pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: find_clone: Internally renamed
>>> drbd_02:0 on pbxsrv1 to drbd_02:2 (ORPHAN)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: group_print:  Resource
>>> Group: pbx_service_01
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      ip_01
>>> (ocf::heartbeat:IPaddr2):       Started pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      fs_01
>>> (ocf::heartbeat:Filesystem):    Started pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      pbx_01
>>> (lsb:znd-pbx_01):       Started pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      sshd_01
>>> (lsb:znd-sshd-pbx_01):  Started pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:
>>> mailAlert_01       (ocf::heartbeat:MailTo):        Started pbxsrv1
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: clone_print:  Master/Slave
>>> Set: ms-drbd_01
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: short_print:      Masters: [ pbxsrv1 ]
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: short_print:      Slaves: [ pbxsrv3 ]
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: group_print:  Resource
>>> Group: pbx_service_02
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      ip_02
>>> (ocf::heartbeat:IPaddr2):       Started pbxsrv2
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      fs_02
>>> (ocf::heartbeat:Filesystem):    Started pbxsrv2
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      pbx_02
>>> (lsb:znd-pbx_02):       Started pbxsrv2
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:      sshd_02
>>> (lsb:znd-sshd-pbx_02):  Started pbxsrv2
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: native_print:
>>> mailAlert_02       (ocf::heartbeat:MailTo):        Started pbxsrv2
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: clone_print:  Master/Slave
>>> Set: ms-drbd_02
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: short_print:      Masters: [ pbxsrv2 ]
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: short_print:      Slaves: [ pbxsrv3 ]
>>> 12:04:43 pbxsrv3 cib: [7628]: info: write_cib_contents: Wrote version
>>> 0.79.0 of the CIB to disk (digest: 321dfdbd8a7ecd8c46c7b7b5b43a38de)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_01:0 (Master pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_01:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: native_color: Resource pbx_01
>>> cannot run anywhere
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: rsc_merge_weights: sshd_01:
>>> Rolling back scores from mailAlert_01
>>> 12:04:43 pbxsrv3 cib: [7628]: info: retrieveCib: Reading cluster
>>> configuration from: /var/lib/heartbeat/crm/cib.Sm7jHT (digest:
>>> /var/lib/heartbeat/crm/cib.2rQOuV)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: native_color: Resource sshd_01
>>> cannot run anywhere
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: native_color: Resource
>>> mailAlert_01 cannot run anywhere
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_01:0 (Master pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_01:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_02:0 (Master pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_02:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_02:0 (Master pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_02:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> ip_01      (Started pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> fs_01      (Started pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
>>> pbx_01      (pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
>>> sshd_01     (pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
>>> mailAlert_01        (pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_01:0  (Master pbxsrv1)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_01:1  (Slave pbxsrv3)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> ip_02      (Started pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> fs_02      (Started pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> pbx_02     (Started pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> sshd_02    (Started pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> mailAlert_02       (Started pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_02:0  (Master pbxsrv2)
>>> 12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_02:1  (Slave pbxsrv3)
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_state_transition: State
>>> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>>> cause=C_IPC_MESSAGE origin=handle_response ]
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: unpack_graph: Unpacked transition
>>> 4: 6 actions in 6 synapses
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: do_te_invoke: Processing graph 4
>>> (ref=pe_calc-dc-1290683083-98) derived from
>>> /var/lib/pengine/pe-input-11.bz2
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: te_pseudo_action: Pseudo action
>>> 27 fired and confirmed
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: te_rsc_command: Initiating action
>>> 24: stop mailAlert_01_stop_0 on pbxsrv1
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: - <cib
>>> admin_epoch="0" epoch="79" num_updates="1" >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: abort_transition_graph:
>>> need_abort:59 - Triggered transition abort (complete=0) : Non-status
>>> change
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <configuration >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: update_abort_priority: Abort
>>> priority upgraded from 0 to 1000000
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <resources >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: update_abort_priority: Abort
>>> action done superceeded by restart
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> <group id="pbx_service_01" >
>>> 12:04:43 pbxsrv3 crmd: [5914]: info: need_abort: Aborting on change to
>>> admin_epoch
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>   <primitive id="pbx_01" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>     <meta_attributes id="pbx_01-meta_attributes" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>       <nvpair value="Stopped" id="pbx_01-meta_attributes-target-role"
>>> />
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>     </meta_attributes>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>>   </primitive>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -       </group>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> </resources>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: -
>>> </configuration>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: - </cib>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: + <cib
>>> admin_epoch="0" epoch="80" num_updates="1" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <configuration >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <resources >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> <group id="pbx_service_01" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>   <primitive id="pbx_01" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>     <meta_attributes id="pbx_01-meta_attributes" >
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>       <nvpair value="Started" id="pbx_01-meta_attributes-target-role"
>>> />
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>     </meta_attributes>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>>   </primitive>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +       </group>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> </resources>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: +
>>> </configuration>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: log_data_element: cib:diff: + </cib>
>>> 12:04:43 pbxsrv3 cib: [5910]: info: cib_process_request: Operation
>>> complete: op cib_replace for section resources
>>> (origin=local/cibadmin/2, version=0.80.1): ok (rc=0)
>>> 12:04:43 pbxsrv3 pengine: [6396]: info: process_pe_message: Transition
>>> 4: PEngine Input stored in: /var/lib/pengine/pe-input-11.bz2
>>> 12:04:43 pbxsrv3 cib: [7631]: info: write_cib_contents: Archived
>>> previous version as /var/lib/heartbeat/crm/cib-29.raw
>>> 12:04:43 pbxsrv3 cib: [7631]: info: write_cib_contents: Wrote version
>>> 0.80.0 of the CIB to disk (digest: aed1806054239b7f4ea0f54bd7477c5d)
>>> 12:04:43 pbxsrv3 cib: [7631]: info: retrieveCib: Reading cluster
>>> configuration from: /var/lib/heartbeat/crm/cib.nAviBZ (digest:
>>> /var/lib/heartbeat/crm/cib.dhOIH1)
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: match_graph_event: Action
>>> mailAlert_01_stop_0 (24) confirmed on pbxsrv1 (rc=0)
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: run_graph:
>>> ====================================================
>>> 12:04:44 pbxsrv3 crmd: [5914]: notice: run_graph: Transition 4
>>> (Complete=2, Pending=0, Fired=0, Skipped=4, Incomplete=0,
>>> Source=/var/lib/pengine/pe-input-11.bz2): Stopped
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: te_graph_trigger: Transition 4 is
>>> now complete
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_state_transition: All 3
>>> cluster nodes are eligible to run resources.
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_pe_invoke: Query 56:
>>> Requesting the current CIB: S_POLICY_ENGINE
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_pe_invoke_callback: Invoking
>>> the PE: query=56, ref=pe_calc-dc-1290683084-100, seq=3, quorate=1
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: unpack_config: Node scores:
>>> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv3 is online
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv2 is online
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: unpack_rsc_op: Hard error -
>>> sshd_01_monitor_0 failed with rc=5: Preventing sshd_01 from
>>> re-starting on pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: determine_online_status: Node
>>> pbxsrv1 is online
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: unpack_rsc_op: Hard error -
>>> sshd_02_monitor_0 failed with rc=5: Preventing sshd_02 from
>>> re-starting on pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: find_clone: Internally renamed
>>> drbd_02:0 on pbxsrv1 to drbd_02:2 (ORPHAN)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: group_print:  Resource
>>> Group: pbx_service_01
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      ip_01
>>> (ocf::heartbeat:IPaddr2):       Started pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      fs_01
>>> (ocf::heartbeat:Filesystem):    Started pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      pbx_01
>>> (lsb:znd-pbx_01):       Started pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      sshd_01
>>> (lsb:znd-sshd-pbx_01):  Started pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:
>>> mailAlert_01       (ocf::heartbeat:MailTo):        Stopped
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: clone_print:  Master/Slave
>>> Set: ms-drbd_01
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: short_print:      Masters: [ pbxsrv1 ]
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: short_print:      Slaves: [ pbxsrv3 ]
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: group_print:  Resource
>>> Group: pbx_service_02
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      ip_02
>>> (ocf::heartbeat:IPaddr2):       Started pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      fs_02
>>> (ocf::heartbeat:Filesystem):    Started pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      pbx_02
>>> (lsb:znd-pbx_02):       Started pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:      sshd_02
>>> (lsb:znd-sshd-pbx_02):  Started pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: native_print:
>>> mailAlert_02       (ocf::heartbeat:MailTo):        Started pbxsrv2
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: clone_print:  Master/Slave
>>> Set: ms-drbd_02
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: short_print:      Masters: [ pbxsrv2 ]
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: short_print:      Slaves: [ pbxsrv3 ]
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_01:0 (Master pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_01:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_01:0 (Master pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_01:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_02:0 (Master pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_02:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: Promoting
>>> drbd_02:0 (Master pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: info: master_color: ms-drbd_02:
>>> Promoted 1 instances of a possible 1 to master
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: RecurringOp:  Start
>>> recurring monitor (2s) for mailAlert_01 on pbxsrv1
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> ip_01      (Started pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> fs_01      (Started pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> pbx_01     (Started pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> sshd_01    (Started pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Start
>>> mailAlert_01        (pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_01:0  (Master pbxsrv1)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_01:1  (Slave pbxsrv3)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> ip_02      (Started pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> fs_02      (Started pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> pbx_02     (Started pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> sshd_02    (Started pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> mailAlert_02       (Started pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_02:0  (Master pbxsrv2)
>>> 12:04:44 pbxsrv3 pengine: [6396]: notice: LogActions: Leave resource
>>> drbd_02:1  (Slave pbxsrv3)
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_state_transition: State
>>> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>>> cause=C_IPC_MESSAGE origin=handle_response ]
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: unpack_graph: Unpacked transition
>>> 5: 4 actions in 4 synapses
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: do_te_invoke: Processing graph 5
>>> (ref=pe_calc-dc-1290683084-100) derived from
>>> /var/lib/pengine/pe-input-12.bz2
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: te_pseudo_action: Pseudo action
>>> 27 fired and confirmed
>>> 12:04:44 pbxsrv3 crmd: [5914]: info: te_rsc_command: Initiating action
>>> 25: start mailAlert_01_start_0 on pbxsrv1
>>> 12:04:45 pbxsrv3 pengine: [6396]: info: process_pe_message: Transition
>>> 5: PEngine Input stored in: /var/lib/pengine/pe-input-12.bz2
>>> 12:04:47 pbxsrv3 crmd: [5914]: info: match_graph_event: Action
>>> mailAlert_01_start_0 (25) confirmed on pbxsrv1 (rc=0)
>>> 12:04:47 pbxsrv3 crmd: [5914]: info: te_pseudo_action: Pseudo action
>>> 28 fired and confirmed
>>> 12:04:47 pbxsrv3 crmd: [5914]: info: te_rsc_command: Initiating action
>>> 26: monitor mailAlert_01_monitor_2000 on pbxsrv1
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: match_graph_event: Action
>>> mailAlert_01_monitor_2000 (26) confirmed on pbxsrv1 (rc=0)
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: run_graph:
>>> ====================================================
>>> 12:04:48 pbxsrv3 crmd: [5914]: notice: run_graph: Transition 5
>>> (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pengine/pe-input-12.bz2): Complete
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: te_graph_trigger: Transition 5 is
>>> now complete
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: notify_crmd: Transition 5 status:
>>> done - <null>
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> 12:04:48 pbxsrv3 crmd: [5914]: info: do_state_transition: Starting
>>> PEngine Recheck Timer
>>> 
>>> 
>>> and the conf
>>> 
>>> [root at pbxsrv3 ~]# crm configure show
>>> node $id="1db957ef-20b5-43e0-84f7-f3224a7084bc" pbxsrv2
>>> node $id="3f4fcfc7-8da5-4d9c-87a9-01e65e16d44f" pbxsrv3
>>> node $id="8f04b98f-fbe3-479f-bbea-e078d65b2de4" pbxsrv1
>>> primitive drbd_01 ocf:linbit:drbd \
>>>        params drbd_resource="drbd_resource_01" \
>>>        op monitor interval="30s" \
>>>        op start interval="0" timeout="240s" \
>>>        op stop interval="0" timeout="120s"
>>> primitive drbd_02 ocf:linbit:drbd \
>>>        params drbd_resource="drbd_resource_02" \
>>>        op monitor interval="30s" \
>>>        op start interval="0" timeout="240s" \
>>>        op stop interval="0" timeout="120s"
>>> primitive fs_01 ocf:heartbeat:Filesystem \
>>>        params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \
>>>        meta migration-threshold="3" failure-timeout="60" \
>>>        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>>>        op start interval="0" timeout="60s" \
>>>        op stop interval="0" timeout="60s"
>>> primitive fs_02 ocf:heartbeat:Filesystem \
>>>        params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \
>>>        meta migration-threshold="3" failure-timeout="60" \
>>>        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>>>        op start interval="0" timeout="60s" \
>>>        op stop interval="0" timeout="60s"
>>> primitive ip_01 ocf:heartbeat:IPaddr2 \
>>>        params ip="192.168.78.10" nic="eth3" cidr_netmask="24"
>>> broadcast="192.168.78.255" \
>>>        meta failure-timeout="120" migration-threshold="3" \
>>>        op monitor interval="5s"
>>> primitive ip_02 ocf:heartbeat:IPaddr2 \
>>>        params ip="192.168.78.30" nic="eth3" cidr_netmask="24"
>>> broadcast="192.168.78.255" \
>>>        meta failure-timeout="120" migration-threshold="3" \
>>>        op monitor interval="5s"
>>> primitive mailAlert_01 ocf:heartbeat:MailTo \
>>>        params email="root" subject="[Zanadoo Clustet event] pbx_service_01" \
>>>        op monitor interval="2" timeout="10" \
>>>        op start interval="0" timeout="10" \
>>>        op stop interval="0" timeout="10"
>>> primitive mailAlert_02 ocf:heartbeat:MailTo \
>>>        params email="root" subject="[Zanadoo Clustet event] pbx_service_02" \
>>>        op monitor interval="2" timeout="10" \
>>>        op start interval="0" timeout="10" \
>>>        op stop interval="0" timeout="10"
>>> primitive pbx_01 lsb:znd-pbx_01 \
>>>        meta migration-threshold="3" failure-timeout="60"
>>> target-role="Started" \
>>>        op monitor interval="20s" timeout="20s" \
>>>        op start interval="0" timeout="60s" \
>>>        op stop interval="0" timeout="60s"
>>> primitive pbx_02 lsb:znd-pbx_02 \
>>>        meta migration-threshold="3" failure-timeout="60"
>>> target-role="Started" \
>>>        op monitor interval="20s" timeout="15s" \
>>>        op start interval="0" timeout="60s" \
>>>        op stop interval="0" timeout="60s"
>>> primitive sshd_01 lsb:znd-sshd-pbx_01 \
>>>        op monitor on-fail="stop" interval="10m" \
>>>        op start interval="0" timeout="60s" on-fail="stop" \
>>>        op stop interval="0" timeout="60s" on-fail="stop"
>>> primitive sshd_02 lsb:znd-sshd-pbx_02 \
>>>        op monitor on-fail="stop" interval="10m" \
>>>        op start interval="0" timeout="60s" on-fail="stop" \
>>>        op stop interval="0" timeout="60s" on-fail="stop"
>>> group pbx_service_01 ip_01 fs_01 pbx_01 sshd_01 mailAlert_01 \
>>>        meta target-role="Started"
>>> group pbx_service_02 ip_02 fs_02 pbx_02 sshd_02 mailAlert_02 \
>>>        meta target-role="Started"
>>> ms ms-drbd_01 drbd_01 \
>>>        meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Started"
>>> ms ms-drbd_02 drbd_02 \
>>>        meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Started"
>>> location PrimaryNode-drbd_01 ms-drbd_01 100: pbxsrv1
>>> location PrimaryNode-drbd_02 ms-drbd_02 100: pbxsrv2
>>> location PrimaryNode-pbx_service_01 pbx_service_01 200: pbxsrv1
>>> location PrimaryNode-pbx_service_02 pbx_service_02 200: pbxsrv2
>>> location SecondaryNode-drbd_01 ms-drbd_01 0: pbxsrv3
>>> location SecondaryNode-drbd_02 ms-drbd_02 0: pbxsrv3
>>> location SecondaryNode-pbx_service_01 pbx_service_01 10: pbxsrv3
>>> location SecondaryNode-pbx_service_02 pbx_service_02 10: pbxsrv3
>>> colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master
>>> colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master
>>> order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start
>>> order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote pbx_service_02:start
>>> property $id="cib-bootstrap-options" \
>>>        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>>>        cluster-infrastructure="Heartbeat" \
>>>        symmetric-cluster="false" \
>>>        stonith-enabled="false"
>>> rsc_defaults $id="rsc-options" \
>>>        resource-stickiness="1000"
>>> 
>>> I created a bug 2516 for that issue few weeks ago and since I have
>>> seen any responce I thought to bring this in the mailing list.
>>> 
>>> Cheers,
>>> Pavlos
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list