[Pacemaker] Node stuck in pending state

Digimer lists at alteeve.ca
Wed Apr 9 13:32:28 EDT 2014


When a node enters an unknown state (from the perspective of the rest of 
the cluster), it is extremely unsafe to assume what state it is in. The 
only safe option is to block and call a fence to put the lost node into 
a known state. Only when the fence action confirms that the lost node 
was successfully isolated (rebooted, usually) is it safe for the cluster 
to proceed with recovery.

A properly configured cluster will react to a failed fence by blocking. 
An improperly configured cluster will make assumptions and enter an 
undefined state where it's hard to predict what will happen next, but 
often it's "not good".

Take a minute to read this please:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Concept.3B_Fencing

It's about cman + rgmanager, but the concepts port 1:1 to pacemaker.

The best analogy I can think of for fencing is to compare it to 
seatbelts in cars. You don't appreciate their importance when you've 
never had an accident, so often people leave them unbuckled. When you 
crash though, the seatbelt can make all the difference in the world. 
Fencing is like that. I often hear people say "I've been in production 
for over a year without fencing and it was fine!". Of course, they 
didn't crash in that time, so they didn't need fencing before then.

digimer

On 09/04/14 12:10 PM, Campbell, Gene wrote:
> Thanks for the response.  I hope you don¹t mind a couple questions along
> the way to understanding this issue.
>
> We have storage attached to vm5
> Power is cut to vm5
> Failover to vm6 happens and storage is made available there
> vm5 reboots
>
> Can you tell Where fencing is happening in this picture?  Will keep
> reading docs, and looking at logs, but anything think you do to help would
> be much appreciated.
>
> Thanks
> Gene
>
>
>
> On 4/8/14, 2:29 PM, "Digimer" <lists at alteeve.ca> wrote:
>
>> Looks like your fencing (stonith) failed.
>>
>> On 08/04/14 05:25 PM, Campbell, Gene wrote:
>>> Hello fine folks in Pacemaker land.   Hopefully you could share your
>>> insight into this little problem for us.
>>>
>>> We have a intermittent problem with failover.
>>>
>>> two node cluster
>>> first node power is cut
>>> failover begins to second node
>>> first node reboots
>>> crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)
>>>
>>> Example output from vm5
>>> Node lotus-4vm5: pending
>>> Online: [ lotus-4vm6 ]
>>>
>>> Example output from vm6
>>> Online: [ lotus-4vm5  lotus-4vm6 ]
>>>
>>> Environment
>>> Centos 6.5 on KVM vms
>>> Pacemaker 1.1.10
>>> Corosync 1.4.1
>>>
>>> vm5 /var/log/messages
>>> Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager
>>> Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting
>>> Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages
>>> agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
>>> nagios  corosync-plugin cman
>>> Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git Version:
>>> 368c726
>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Recorded connection 0x20b6280 for attrd/0
>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice:
>>> crm_cluster_connect: Connecting to cluster infrastructure: classic
>>> openais (with plugin)
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>> delivery failed (rc=-2)
>>> Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting
>>> mainloop...
>>> Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Recorded connection 0x20ba600 for stonith-ng/0
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Recorded connection 0x20be980 for cib/0
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Sending membership update 24 to cib
>>> Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice:
>>> plugin_handle_membership: Membership 24: quorum acquired
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was (null))
>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
>>> member (was (null))
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Recorded connection 0x20c2d00 for crmd/0
>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>> Sending membership update 24 to crmd
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
>>> Defaulting to uname -n for the local classic openais (with plugin) node
>>> name
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice:
>>> plugin_handle_membership: Membership 24: quorum acquired
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was (null))
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
>>> member (was (null))
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_started: The local
>>> CRM is operational
>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
>>> State transition S_STARTING -> S_PENDING [ input=I_PENDING
>>> cause=C_FSA_INTERNAL origin=do_started ]
>>> Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: setup_cib:
>>> Watching for stonith topology changes
>>> Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: unpack_config:
>>> On loss of CCM Quorum: Ignore
>>> Apr  8 09:54:10 lotus-4vm5 stonith-ng[1790]:   notice:
>>> stonith_device_register: Added 'st-fencing' to the device list (1 active
>>> devices)
>>> Apr  8 09:54:10 lotus-4vm5 cib[1789]:   notice:
>>> cib_server_process_diff: Not applying diff 0.31.21 -> 0.31.22 (sync in
>>> progress)
>>> Apr  8 09:54:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
>>> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:    error: crm_timer_popped:
>>> Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION!
>>> (120000ms)
>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
>>> I_RELEASE_DC from do_election_count_vote() received in state
>>> S_INTEGRATION
>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: join_query_callback:
>>> No DC for join-1
>>>
>>>
>>> vm6 /var/log/messages
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 16: memb=1,
>>> new=0, lost=0
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
>>> lost=0
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> update_member: Node 3176140298/lotus-4vm5 is now: member
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: NEW:  lotus-4vm5 3176140298
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> send_member_notification: Sending membership update 16 to 2 children
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>> joined or left the membership and a new membership was formed.
>>> Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice:
>>> plugin_handle_membership: Membership 16: quorum acquired
>>> Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was lost)
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 16
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> send_member_notification: Sending membership update 16 to 2 children
>>> Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice:
>>> plugin_handle_membership: Membership 16: quorum acquired
>>> Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was lost)
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
>>> sender r(0) ip(10.14.80.189) r(1) ip(10.128.0.189) ; members(old:1
>>> left:0)
>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed service
>>> synchronization, ready to provide service.
>>> Apr  8 09:52:57 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:53:14 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:53:15 lotus-4vm6 stonith-ng[2492]:  warning: parse_host_line:
>>> Could not parse (38 47): "console"
>>> Apr  8 09:53:20 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>> failed, forming new configuration.
>>> Apr  8 09:53:21 lotus-4vm6 stonith-ng[2492]:   notice: log_operation:
>>> Operation 'reboot' [3306] (call 2 from crmd.2496) for host 'lotus-4vm5'
>>> with device 'st-fencing' returned: 0 (OK)
>>> Apr  8 09:53:21 lotus-4vm6 crmd[2496]:   notice: erase_xpath_callback:
>>> Deletion of "//node_state[@uname='lotus-4vm5']/lrm": Timer expired
>>> (rc=-62)
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 20: memb=1,
>>> new=0, lost=1
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: lost: lotus-4vm5 3176140298
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0,
>>> lost=0
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> ais_mark_unseen_peer_dead: Node lotus-4vm5 was not seen in the previous
>>> transition
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> update_member: Node 3176140298/lotus-4vm5 is now: lost
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> send_member_notification: Sending membership update 20 to 2 children
>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>> joined or left the membership and a new membership was formed.
>>> Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice:
>>> plugin_handle_membership: Membership 20: quorum lost
>>> Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> lost (was member)
>>> Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice:
>>> plugin_handle_membership: Membership 20: quorum lost
>>> Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> lost (was member)
>>> Apr  8 09:53:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:53:43 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:01 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 24: memb=1,
>>> new=0, lost=0
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
>>> lost=0
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> update_member: Node 3176140298/lotus-4vm5 is now: member
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: NEW:  lotus-4vm5 3176140298
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> send_member_notification: Sending membership update 24 to 2 children
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>> joined or left the membership and a new membership was formed.
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>> plugin_handle_membership: Membership 24: quorum acquired
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was lost)
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 24
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>> send_member_notification: Sending membership update 24 to 2 children
>>> Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice:
>>> plugin_handle_membership: Membership 24: quorum acquired
>>> Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>> member (was lost)
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
>>> sender r(0) ip(10.14.80.190) r(1) ip(10.128.0.190) ; members(old:2
>>> left:1)
>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed service
>>> synchronization, ready to provide service.
>>> Apr  8 09:54:04 lotus-4vm6 stonith-ng[2492]:   notice: remote_op_done:
>>> Operation reboot of lotus-4vm5 by lotus-4vm6 for
>>> crmd.2496 at lotus-4vm6.ae82b411<mailto:crmd.2496 at lotus-4vm6.ae82b411>: OK
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>> tengine_stonith_callback: Stonith operation
>>> 2/13:0:0:f325afae-64b0-4812-a897-70556ab1e806: OK (0)
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>> tengine_stonith_notify: Peer lotus-4vm5 was terminated (reboot) by
>>> lotus-4vm6 for lotus-4vm6: OK (ref=ae82b411-b07a-4235-be55-5a30a00b323b)
>>> by client crmd.2496
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
>>> send_stonith_update: Node lotus-4vm5[3176140298] - state is now lost
>>> (was member)
>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>> 0 (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-warn-25.bz2): Stopped
>>> Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
>>> Sending full refresh (origin=crmd)
>>> Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
>>> Sending flush op to all hosts for: probe_complete (true)
>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>> loss of CCM Quorum: Ignore
>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
>>> st-fencing#011(lotus-4vm6)
>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
>>> MGS_607d26#011(lotus-4vm6)
>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
>>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-912.bz2
>>> Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>> Initiating action 5: start st-fencing_start_0 on lotus-4vm6 (local)
>>> Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>> Initiating action 6: start MGS_607d26_start_0 on lotus-4vm6 (local)
>>> Apr  8 09:54:05 lotus-4vm6 stonith-ng[2492]:   notice:
>>> stonith_device_register: Device 'st-fencing' already existed in device
>>> list (1 active devices)
>>> Apr  8 09:54:05 lotus-4vm6 kernel: LDISKFS-fs warning (device sda):
>>> ldiskfs_multi_mount_protect: MMP interval 42 higher than expected,
>>> please wait.
>>> Apr  8 09:54:05 lotus-4vm6 kernel:
>>> Apr  8 09:54:10 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error
>>> (-5)
>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
>>> operation st-fencing_start_0 (call=24, rc=0, cib-update=89,
>>> confirmed=true) ok
>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: crmd_cs_dispatch:
>>> Recieving messages from a node we think is dead: lotus-4vm5[-1118826998]
>>> Apr  8 09:54:24 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:31 lotus-4vm6 crmd[2496]:   notice:
>>> do_election_count_vote: Election 2 (current: 2, owner: lotus-4vm5):
>>> Processed vote from lotus-4vm5 (Peer is not part of our cluster)
>>> Apr  8 09:54:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:46 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): recovery complete
>>> Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): mounted filesystem
>>> with ordered data mode. quota=on. Opts:
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [ [ ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [   { ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [     "args": [ ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [       "mount",  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [       "-t",  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [       "lustre",  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [
>>> "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1",  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [       "/mnt/MGS" ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [     ],  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [     "rc": 0,  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [     "stderr": "",  ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [     "stdout": "" ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [   } ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [ ] ]
>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>> MGS_607d26_start_0:3444:stderr [  ]
>>> Apr  8 09:54:48 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
>>> operation MGS_607d26_start_0 (call=26, rc=0, cib-update=94,
>>> confirmed=true) ok
>>> Apr  8 09:54:49 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>> 1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-912.bz2): Stopped
>>> Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
>>> Sending full refresh (origin=crmd)
>>> Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
>>> Sending flush op to all hosts for: probe_complete (true)
>>> Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>> loss of CCM Quorum: Ignore
>>> Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
>>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-913.bz2
>>> Apr  8 09:54:50 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>> Initiating action 9: monitor MGS_607d26_monitor_5000 on lotus-4vm6
>>> (local)
>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
>>> operation MGS_607d26_monitor_5000 (call=30, rc=0, cib-update=102,
>>> confirmed=false) ok
>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>> 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-913.bz2): Complete
>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> Apr  8 09:55:07 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:55:23 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: Evicted from MGS (at
>>> 10.14.80.190 at tcp) after server handle changed from 0x7acffb201664d0a4 to
>>> 0x9a6b02eee57f3dba
>>> Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: MGC10.14.80.189 at tcp:
>>> Connection restored to MGS (at 0 at lo)
>>> Apr  8 09:55:42 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:55:58 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:56:12 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:56:26 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:  warning: crmd_ha_msg_filter:
>>> Another DC detected: lotus-4vm5 (op=join_offer)
>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>> State transition S_IDLE -> S_ELECTION [ input=I_ELECTION
>>> cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>> cause=C_FSA_INTERNAL origin=do_election_check ]
>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice:
>>> do_election_count_vote: Election 3 (current: 3, owner: lotus-4vm6):
>>> Processed no-vote from lotus-4vm5 (Peer is not part of our cluster)
>>> Apr  8 09:56:36 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>> Apr  8 09:56:37 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error
>>> (-5)
>>> Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
>>> Sending full refresh (origin=crmd)
>>> Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
>>> Sending flush op to all hosts for: probe_complete (true)
>>> Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>> loss of CCM Quorum: Ignore
>>> Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
>>> Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-914.bz2
>>> Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>> 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-914.bz2): Complete
>>> Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>
>>> Thank you very much
>>> Gene
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Pacemaker mailing list