[Pacemaker] error: do_exit: Could not recover from internal error

Wed May 22 23:05:05 UTC 2013

On 22/05/2013, at 9:44 PM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:

> Using pacemaker 1.1.8-7 on EL6, I got the following series of events
> trying to shut down pacemaker and then corosync.  The corosync shutdown
> (service corosync stop) ended up spinning/hanging indefinitely (~7hrs
> now).  The events, including a:
> 
> May 21 23:47:18 node1 crmd[17598]:    error: do_exit: Could not recover from internal error
> 
> For completeness, I've included the logging from the whole session,
> from corosync startup until it all went haywire.  The badness seems to
> start at about 23:43:18.
> 
> May 21 23:42:51 node1 corosync[17541]:   [MAIN  ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
> May 21 23:42:51 node1 corosync[17541]:   [MAIN  ] Corosync built-in features: nss dbus rdma snmp
> May 21 23:42:51 node1 corosync[17541]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] Initializing transport (UDP/IP Multicast).
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] Initializing transport (UDP/IP Multicast).
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] The network interface [192.168.122.253] is now up.
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: process_ais_conf: Reading configure
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
> 
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Can I take a brief moment to highlight those last two lines...
And to also refer you to http://blog.clusterlabs.org/blog/2013/pacemaker-on-rhel6-dot-4/ which talks about issues with fencing for plugin-based clusters.

> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_init: Local handle: 5650605097994944513 for logging
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_next: Processing additional logging options...
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Found 'off' for option: debug
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Found 'no' for option: to_logfile
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Found 'yes' for option: to_syslog
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_init: Local handle: 2730409743423111170 for quorum
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_next: No additional configuration supplied for: quorum
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: No default for option: provider
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_init: Local handle: 5880381755227111427 for service
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: config_find_next: Processing additional service options...
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Found '1' for option: ver
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] Logging: Initialized pcmk_startup
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: pcmk_startup: Service: 9
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: pcmk_startup: Local hostname: node1
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: pcmk_update_nodeid: Local node id: 4252674240
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: update_member: Creating entry for node 4252674240 born on 0
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: update_member: 0x12c4640 Node 4252674240 now known as node1 (was: (null))
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: update_member: Node node1 now has 1 quorum votes (was 0)
> May 21 23:42:51 node1 corosync[17541]:   [pcmk  ] info: update_member: Node 4252674240/node1 is now: member
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.1.8
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync configuration service
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync profile loading service
> May 21 23:42:51 node1 corosync[17541]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> May 21 23:42:51 node1 corosync[17541]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> May 21 23:42:51 node1 corosync[17541]:   [TOTEM ] The network interface [10.0.0.253] is now up.
> May 21 23:42:52 node1 corosync[17541]:   [TOTEM ] Incrementing problem counter for seqid 1 iface 10.0.0.253 to [1 of 10]
> May 21 23:42:52 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> May 21 23:42:52 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> May 21 23:42:52 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: NEW:  node1 4252674240
> May 21 23:42:52 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: node1 4252674240
> May 21 23:42:52 node1 corosync[17541]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> May 21 23:42:52 node1 corosync[17541]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.253) r(1) ip(10.0.0.253) ; members(old:0 left:0)
> May 21 23:42:52 node1 corosync[17541]:   [MAIN  ] Completed service synchronization, ready to provide service.
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 8: memb=1, new=0, lost=0
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: memb: node1 4252674240
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 8: memb=2, new=1, lost=0
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: Creating entry for node 2608507072 born on 8
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: Node 2608507072/unknown is now: member
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: NEW:  .pending. 2608507072
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: .pending. 2608507072
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: node1 4252674240
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: send_member_notification: Sending membership update 8 to 0 children
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: 0x12c4640 Node 4252674240 ((null)) born on: 8
> May 21 23:42:54 node1 corosync[17541]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: 0x12c8b50 Node 2608507072 (node2) born on: 8
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: 0x12c8b50 Node 2608507072 now known as node2 (was: (null))
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: update_member: Node node2 now has 1 quorum votes (was 0)
> May 21 23:42:54 node1 corosync[17541]:   [pcmk  ] info: send_member_notification: Sending membership update 8 to 0 children
> May 21 23:42:54 node1 corosync[17541]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.155) r(1) ip(10.0.0.155) ; members(old:1 left:0)
> May 21 23:42:54 node1 corosync[17541]:   [MAIN  ] Completed service synchronization, ready to provide service.
> May 21 23:42:54 node1 corosync[17541]:   [TOTEM ] ring 1 active with no faults
> May 21 23:42:56 node1 pacemakerd[17587]:   notice: main: Starting Pacemaker 1.1.8-7.wc1.el6 (Build: 394e906):  generated-manpages agent-manpages ascii-docs ncurses libqb-logging libqb-ipc lha-fencing  corosync-plugin cman
> May 21 23:42:56 node1 pacemakerd[17587]:   notice: get_local_node_name: Defaulting to uname(2).nodename for the local classic openais (with plugin) node name
> May 21 23:42:56 node1 pacemakerd[17587]:   notice: update_node_processes: 0x1ec7cc0 Node 4252674240 now known as node1, was: 
> May 21 23:42:56 node1 stonith-ng[17594]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> May 21 23:42:56 node1 cib[17593]:   notice: main: Using new config location: /var/lib/pacemaker/cib
> May 21 23:42:56 node1 cib[17593]:  warning: retrieveCib: Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
> May 21 23:42:56 node1 cib[17593]:  warning: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> May 21 23:42:56 node1 cib[17593]:  warning: readCibXmlFile: Continuing with an empty configuration.
> May 21 23:42:56 node1 attrd[17596]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> May 21 23:42:56 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x12d6560 for stonith-ng/0
> May 21 23:42:56 node1 crmd[17598]:   notice: main: CRM Git Version: 394e906
> May 21 23:42:56 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x12da8e0 for attrd/0
> May 21 23:42:56 node1 attrd[17596]:   notice: main: Starting mainloop...
> May 21 23:42:56 node1 cib[17593]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> May 21 23:42:56 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x12df4c0 for cib/0
> May 21 23:42:56 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Sending membership update 8 to cib
> May 21 23:42:56 node1 cib[17593]:   notice: ais_dispatch_message: Membership 8: quorum acquired
> May 21 23:42:56 node1 cib[17593]:   notice: crm_update_peer_state: crm_update_ais_node: Node node1[4252674240] - state is now member
> May 21 23:42:56 node1 cib[17593]:   notice: crm_update_peer_state: crm_update_ais_node: Node node2[2608507072] - state is now member
> May 21 23:42:57 node1 pacemakerd[17587]:   notice: update_node_processes: 0x1ec8c20 Node 2608507072 now known as node2, was: 
> May 21 23:42:57 node1 stonith-ng[17594]:   notice: setup_cib: Watching for stonith topology changes
> May 21 23:42:57 node1 crmd[17598]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> May 21 23:42:57 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x12e4340 for crmd/0
> May 21 23:42:57 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc: Sending membership update 8 to crmd
> May 21 23:42:57 node1 crmd[17598]:   notice: ais_dispatch_message: Membership 8: quorum acquired
> May 21 23:42:57 node1 crmd[17598]:   notice: crm_update_peer_state: crm_update_ais_node: Node node1[4252674240] - state is now member
> May 21 23:42:57 node1 crmd[17598]:   notice: crm_update_peer_state: crm_update_ais_node: Node node2[2608507072] - state is now member
> May 21 23:42:57 node1 crmd[17598]:    error: crmd_ais_dispatch: Recieving messages from a node we think is dead: node1[-42293056]
> May 21 23:42:57 node1 crmd[17598]:   notice: do_started: The local CRM is operational
> May 21 23:42:58 node1 cib[17593]:  warning: cib_process_diff: Diff 0.0.0 -> 0.0.1 from node2 not applied to 0.0.1: current "num_updates" is greater than required
> May 21 23:42:58 node1 crmd[17598]:    error: crmd_ais_dispatch: Recieving messages from a node we think is dead: node2[-1686460224]
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 12: memb=2, new=0, lost=0
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: memb: node2 2608507072
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: memb: node1 4252674240
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 12: memb=3, new=1, lost=0
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: update_member: Creating entry for node 2474289344 born on 12
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: update_member: Node 2474289344/unknown is now: member
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: NEW:  .pending. 2474289344
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: .pending. 2474289344
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: node2 2608507072
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: pcmk_peer_update: MEMB: node1 4252674240
> May 21 23:43:01 node1 corosync[17541]:   [pcmk  ] info: send_member_notification: Sending membership update 12 to 2 children
> May 21 23:43:01 node1 corosync[17541]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> May 21 23:43:01 node1 crmd[17598]:   notice: crm_update_peer_state: crm_update_ais_node: Node (null)[2474289344] - state is now member
> May 21 23:43:01 node1 cib[17593]:   notice: crm_update_peer_state: crm_update_ais_node: Node (null)[2474289344] - state is now member
> May 21 23:43:18 node1 crmd[17598]:  warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> May 21 23:45:18 node1 crmd[17598]:    error: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
> May 21 23:45:18 node1 crmd[17598]:   notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=crm_timer_popped ]
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: Diff: --- 0.0.3
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: Diff: +++ 0.1.1 2d896a081763a7b2ca54f82877fbbc84
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="0" num_updates="3" />
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: ++       <cluster_property_set id="cib-bootstrap-options" >
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.wc1.el6-394e906" />
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: ++       </cluster_property_set>
> May 21 23:45:18 node1 cib[17593]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.2.1
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="1" num_updates="1" />
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)" />
> May 21 23:45:18 node1 cib[17593]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.3.1
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="2" num_updates="1" />
> May 21 23:45:18 node1 cib[17593]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2" />
> May 21 23:46:33 node1 pacemakerd[17587]:   notice: pcmk_shutdown_worker: Shuting down Pacemaker
> May 21 23:46:33 node1 pacemakerd[17587]:   notice: stop_child: Stopping crmd: Sent -15 to process 17598
> May 21 23:46:33 node1 crmd[17598]:   notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
> May 21 23:47:18 node1 crmd[17598]:    error: do_log: FSA: Input I_ERROR from feature_update_callback() received in state S_INTEGRATION
> May 21 23:47:18 node1 crmd[17598]:  warning: do_state_transition: State transition S_INTEGRATION -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=feature_update_callback ]
> May 21 23:47:18 node1 crmd[17598]:    error: do_recover: Action A_RECOVER (0000000001000000) not supported
> May 21 23:47:18 node1 crmd[17598]:  warning: do_election_vote: Not voting in election, we're in state S_RECOVERY
> May 21 23:47:18 node1 crmd[17598]:    error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
> May 21 23:47:18 node1 crmd[17598]:   notice: terminate_cs_connection: Disconnecting from Corosync
> May 21 23:47:18 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x12e4340, async-conn=0x12e4340) left
> May 21 23:47:18 node1 crmd[17598]:    error: do_exit: Could not recover from internal error
> May 21 23:47:18 node1 pacemakerd[17587]:    error: pcmk_child_exit: Child process crmd exited (pid=17598, rc=2)
> May 21 23:47:33 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:47:33 node1 pacemakerd[17587]:   notice: stop_child: Stopping pengine: Sent -15 to process 17597
> May 21 23:47:48 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:47:48 node1 pacemakerd[17587]:   notice: stop_child: Stopping attrd: Sent -15 to process 17596
> May 21 23:47:48 node1 attrd[17596]:   notice: main: Exiting...
> May 21 23:47:48 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc_exit: Client attrd (conn=0x12da8e0, async-conn=0x12da8e0) left
> May 21 23:48:03 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:48:03 node1 pacemakerd[17587]:   notice: stop_child: Stopping lrmd: Sent -15 to process 17595
> May 21 23:48:18 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:48:18 node1 pacemakerd[17587]:   notice: stop_child: Stopping stonith-ng: Sent -15 to process 17594
> May 21 23:48:18 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x12d6560, async-conn=0x12d6560) left
> May 21 23:48:33 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:48:33 node1 pacemakerd[17587]:   notice: stop_child: Stopping cib: Sent -15 to process 17593
> May 21 23:48:38 node1 cib[17593]:   notice: cib_force_exit: Forcing exit!
> May 21 23:48:38 node1 corosync[17541]:   [pcmk  ] info: pcmk_ipc_exit: Client cib (conn=0x12df4c0, async-conn=0x12df4c0) left
> May 21 23:48:38 node1 pacemakerd[17587]:    error: pcmk_child_exit: Child process cib exited (pid=17593, rc=64)
> May 21 23:48:53 node1 pacemakerd[17587]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> May 21 23:48:53 node1 pacemakerd[17587]:   notice: pcmk_shutdown_worker: Shutdown complete
> 
> At this point I tried to shut down corosync as described above but it
> failed to shut down and just hung/spun in shutdown.

It looks to me like pacemaker shut down...
Is the other node in this cluster running the same version of pacemaker?

> 
> Any ideas what went wrong here?

It seems to have started with 

> May 21 23:42:58 node1 crmd[17598]:    error: crmd_ais_dispatch: Recieving messages from a node we think is dead: node2[-1686460224]

which led to

> May 21 23:45:18 node1 crmd[17598]:    error: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)

and

> May 21 23:47:18 node1 crmd[17598]:    error: do_log: FSA: Input I_ERROR from feature_update_callback() received in state S_INTEGRATION

This is not a happy cluster.  Are these real machines or VMs?
Firewall config ok?

Also, 1.1.8-7 was not tested with the plugin _at_all_ (and neither will future RHEL builds).

I highly recommend moving to cman for RHEL6
Check out the quickstart: http://clusterlabs.org/quickstart-redhat.html