[ClusterLabs] [Question:pacemaker_remote] By the operation that remote node cannot carry out a cluster, the resource does not move. (STONITH is not carried out, too)
Andrew Beekhof
andrew at beekhof.net
Tue Aug 18 01:17:47 UTC 2015
Should be fixed now. Thanks for the report!
> On 12 Aug 2015, at 1:20 pm, renayama19661014 at ybb.ne.jp wrote:
>
> Hi All,
>
> We confirmed movement of pacemaker_remote.(version:pacemaker-ad1f397a8228a63949f86c96597da5cecc3ed977)
>
> It is the following cluster constitution.
> * bl460g8n3(KVM host)
> * bl460g8n4(KVM host)
> * pgsr01(Guest on the bl460g8n3 host)
> * pgsr02(Guest on the bl460g8n4 host)
>
>
> Step 1) I compose a cluster of a simple resource.
>
> [root at bl460g8n3 ~]# crm_mon -1 -Af
> Last updated: Wed Aug 12 11:52:27 2015 Last change: Wed Aug 12 11:51:47 2015 by root via crm_resource on bl460g8n4
> Stack: corosync
> Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
> 4 nodes and 10 resources configured
>
> Online: [ bl460g8n3 bl460g8n4 ]
> GuestOnline: [ pgsr01 at bl460g8n3 pgsr02 at bl460g8n4 ]
>
> prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
> prmDB2 (ocf::heartbeat:VirtualDomain): Started bl460g8n4
> Resource Group: grpStonith1
> prmStonith1-2 (stonith:external/ipmi): Started bl460g8n4
> Resource Group: grpStonith2
> prmStonith2-2 (stonith:external/ipmi): Started bl460g8n3
> Resource Group: master-group
> vip-master (ocf::heartbeat:Dummy): Started pgsr02
> vip-rep (ocf::heartbeat:Dummy): Started pgsr02
> Master/Slave Set: msPostgresql [pgsql]
> Masters: [ pgsr02 ]
> Slaves: [ pgsr01 ]
>
> Node Attributes:
> * Node bl460g8n3:
> * Node bl460g8n4:
> * Node pgsr01 at bl460g8n3:
> + master-pgsql : 5
> * Node pgsr02 at bl460g8n4:
> + master-pgsql : 10
>
> Migration Summary:
> * Node bl460g8n4:
> * Node bl460g8n3:
> * Node pgsr02 at bl460g8n4:
> * Node pgsr01 at bl460g8n3:
>
>
> Step 2) I cause trouble of pacemaker_remote in pgsr02.
>
> [root at pgsr02 ~]# ps -ef |grep remote
> root 1171 1 0 11:52 ? 00:00:00 /usr/sbin/pacemaker_remoted
> root 1428 1377 0 11:53 pts/0 00:00:00 grep --color=auto remote
> [root at pgsr02 ~]# kill -9 1171
>
>
> Step 3) After trouble, the master-group resource does not start in pgsr01.
>
> [root at bl460g8n3 ~]# crm_mon -1 -Af
> Last updated: Wed Aug 12 11:54:04 2015 Last change: Wed Aug 12 11:51:47 2015 by root via crm_resource on bl460g8n4
> Stack: corosync
> Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
> 4 nodes and 10 resources configured
>
> Online: [ bl460g8n3 bl460g8n4 ]
> GuestOnline: [ pgsr01 at bl460g8n3 ]
>
> prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
> prmDB2 (ocf::heartbeat:VirtualDomain): FAILED bl460g8n4
> Resource Group: grpStonith1
> prmStonith1-2 (stonith:external/ipmi): Started bl460g8n4
> Resource Group: grpStonith2
> prmStonith2-2 (stonith:external/ipmi): Started bl460g8n3
> Master/Slave Set: msPostgresql [pgsql]
> Masters: [ pgsr01 ]
>
> Node Attributes:
> * Node bl460g8n3:
> * Node bl460g8n4:
> * Node pgsr01 at bl460g8n3:
> + master-pgsql : 10
>
> Migration Summary:
> * Node bl460g8n4:
> pgsr02: migration-threshold=1 fail-count=1 last-failure='Wed Aug 12 11:53:39 2015'
> * Node bl460g8n3:
> * Node pgsr01 at bl460g8n3:
>
> Failed Actions:
> * pgsr02_monitor_30000 on bl460g8n4 'unknown error' (1): call=2, status=Error, exitreason='none',
> last-rc-change='Wed Aug 12 11:53:39 2015', queued=0ms, exec=0ms
>
>
> It seems to be caused by the fact that STONITH is not carried out somehow or other.
> The demote operation that a cluster cannot handle seems to obstruct start in pgsr01.
> --------------------------------------------------------------------------------------
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: Graph 10 with 20 actions: batch-limit=20 jobs, network-delay=0ms
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 4]: Pending rsc op prmDB2_stop_0 on bl460g8n4 (priority: 0, waiting: 70)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 36]: Completed pseudo op master-group_stop_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 34]: Completed pseudo op master-group_start_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 82]: Completed rsc op pgsql_post_notify_demote_0 on pgsr01 (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 81]: Completed rsc op pgsql_pre_notify_demote_0 on pgsr01 (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 78]: Completed rsc op pgsql_post_notify_stop_0 on pgsr01 (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 77]: Completed rsc op pgsql_pre_notify_stop_0 on pgsr01 (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 67]: Completed pseudo op msPostgresql_confirmed-post_notify_demoted_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 66]: Completed pseudo op msPostgresql_post_notify_demoted_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 65]: Completed pseudo op msPostgresql_confirmed-pre_notify_demote_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 64]: Completed pseudo op msPostgresql_pre_notify_demote_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 63]: Completed pseudo op msPostgresql_demoted_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 62]: Completed pseudo op msPostgresql_demote_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 55]: Completed pseudo op msPostgresql_confirmed-post_notify_stopped_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 54]: Completed pseudo op msPostgresql_post_notify_stopped_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 53]: Completed pseudo op msPostgresql_confirmed-pre_notify_stop_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 52]: Completed pseudo op msPostgresql_pre_notify_stop_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 51]: Completed pseudo op msPostgresql_stopped_0 on N/A (priority: 1000000, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 50]: Completed pseudo op msPostgresql_stop_0 on N/A (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 70]: Pending rsc op pgsr02_stop_0 on bl460g8n4 (priority: 0, waiting: none)
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: * [Input 38]: Unresolved dependency rsc op pgsql_demote_0 on pgsr02
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: info: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
> Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> --------------------------------------------------------------------------------------
>
> Is there setting to let a cluster carry out STONITH well?
> Is this a bug of pacemaker_remote?
>
> * I registered these contents with Bugzilla.(http://bugs.clusterlabs.org/show_bug.cgi?id=5247)
> * In addition, I attached crm_report to Bugzilla.
>
> Best Regards,
> Hideo Yamauchi.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list