[Pacemaker] can't get pacemaker started

Sun Jul 29 23:47:48 EDT 2012

On Fri, Jul 27, 2012 at 9:31 AM, Dave Jiang <dave.jiang at intel.com> wrote:
> Hi. I'm following the cluster from scratch guide to create a simple
> active/passive 2 node cluster. I'm using the standard packages that come
> with Fedora 17. I have corosync running and linked up. However I cannot
> seem to get Pacemaker to run correctly. I don't see all the processes
> loaded:
>
> 17286 ?        Ss     0:00 /usr/sbin/pacemakerd
> -f
> 17288 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
>
> Looking at the log these stand out:
>
> Jul 26 16:26:02 leftnode cib[17378]:  warning: retrieveCib: Cluster
> configuration not found: /var/lib/heartbeat/crm/cib.xml
> Jul 26 16:26:02 leftnode attrd[17381]:   notice: crm_cluster_connect:
> Connecting to cluster infrastructure: corosync
> Jul 26 16:26:02 leftnode cib[17378]:  warning: readCibXmlFile: Primary
> configuration corrupt or unusable, trying backup...
> Jul 26 16:26:02 leftnode crmd[17383]:     info: crm_log_init_worker:
> Changed active directory to /var/lib/heartbeat/cores/hacluster
> Jul 26 16:26:02 leftnode cib[17378]:  warning: readCibXmlFile:
> Continuing with an empty configuration.
>
> I'm not running heartbeat, should I be? It wasn't talked about in the guide.

No. Some of these paths were determined back when Pacemaker was part
of heartbeat.
The last few are being changed for 1.1.8

>
> And then I noticed the qb_rb_chmod failed and a bunch of other failures.
> Any ideas what am I not setting up correctly?

That looks quite odd.
Which user are you starting corosync as?

Could you show the rpm version of corosync and libqb please?

>
> Jul 26 16:26:02 leftnode crmd[17383]:   notice: main: CRM Git Version:
> ee0730e13d124c3d58f00016c3376a1de5323cff
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ]
> qb_rb_chmod:cpg-request-16373-17381-254: Operation not permitted (1)
> Jul 26 16:26:02 leftnode cib[17378]:     info: validate_with_relaxng:
> Creating RNG parser context
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ] shm connection
> FAILED: Operation not permitted (1)
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ] Error in connection
> setup (16373-17381-254): Operation not permitted (1)
> Jul 26 16:26:02 leftnode attrd[17381]:    error: init_cpg_connection:
> Could not connect to the Cluster Process Group API: 2
> Jul 26 16:26:02 leftnode stonith-ng[17379]:     info:
> init_ais_connection_once: Connection to 'corosync': established
> Jul 26 16:26:02 leftnode attrd[17381]:    error: main: HA Signon failed
> Jul 26 16:26:02 leftnode stonith-ng[17379]:     info: crm_new_peer: Node
> leftnode now has id: 16820416
> Jul 26 16:26:02 leftnode attrd[17381]:    error: main: Aborting startup
> Jul 26 16:26:02 leftnode stonith-ng[17379]:     info: crm_new_peer: Node
> 16820416 is now known as leftnode
> Jul 26 16:26:02 leftnode pacemakerd[17377]:    error: pcmk_child_exit:
> Child process attrd exited (pid=17381, rc=100)
> Jul 26 16:26:02 leftnode pacemakerd[17377]:  warning: pcmk_child_exit:
> Pacemaker child process attrd no longer wishes to be respawned. Shutting
> ourselves down.
> Jul 26 16:26:02 leftnode pacemakerd[17377]:   notice:
> pcmk_shutdown_worker: Shuting down Pacemaker
> Jul 26 16:26:02 leftnode pacemakerd[17377]:   notice: stop_child:
> Stopping crmd: Sent -15 to process 17383
> Jul 26 16:26:02 leftnode crmd[17383]:     info: do_cib_control: Could
> not connect to the CIB service: connection failed
> Jul 26 16:26:02 leftnode cib[17378]:     info: startCib: CIB
> Initialization completed successfully
> Jul 26 16:26:02 leftnode crmd[17383]:  warning: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Jul 26 16:26:02 leftnode cib[17378]:     info: get_cluster_type: Cluster
> type is: 'corosync'
> Jul 26 16:26:02 leftnode crmd[17383]:     info: crm_signal_dispatch:
> Invoking handler for signal 15: Terminated
> Jul 26 16:26:02 leftnode cib[17378]:   notice: crm_cluster_connect:
> Connecting to cluster infrastructure: corosync
> Jul 26 16:26:02 leftnode crmd[17383]:   notice: crm_shutdown: Requesting
> shutdown, upper limit is 1200000ms
> Jul 26 16:26:02 leftnode crmd[17383]:  warning: do_log: FSA: Input
> I_SHUTDOWN from crm_shutdown() received in state S_STARTING
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ]
> qb_rb_chmod:cpg-request-16373-17378-255: Operation not permitted (1)
> Jul 26 16:26:02 leftnode crmd[17383]:   notice: do_state_transition:
> State transition S_STARTING -> S_STOPPING [ input=I_SHUTDOWN
> cause=C_SHUTDOWN origin=crm_shutdown ]
> Jul 26 16:26:02 leftnode crmd[17383]:     info: get_cluster_type:
> Cluster type is: 'corosync'
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ] shm connection
> FAILED: Operation not permitted (1)
> Jul 26 16:26:02 leftnode crmd[17383]:   notice:
> terminate_ais_connection: Disconnecting from Corosync
> Jul 26 16:26:02 leftnode corosync[16373]:   [QB    ] Error in connection
> setup (16373-17378-255): Operation not permitted (1)
> Jul 26 16:26:02 leftnode cib[17378]:    error: init_cpg_connection:
> Could not connect to the Cluster Process Group API: 2
> Jul 26 16:26:02 leftnode crmd[17383]:     info:
> terminate_ais_connection: No CPG connection
> Jul 26 16:26:02 leftnode cib[17378]:     crit: cib_init: Cannot sign in
> to the cluster... terminating
> Jul 26 16:26:02 leftnode crmd[17383]:     info:
> terminate_ais_connection: No Quorum connection
> Jul 26 16:26:02 leftnode pacemakerd[17377]:    error: pcmk_child_exit:
> Child process cib exited (pid=17378, rc=100)
> Jul 26 16:26:02 leftnode crmd[17383]:     info: do_ha_control:
> Disconnected from OpenAIS
> Jul 26 16:26:02 leftnode pacemakerd[17377]:  warning: pcmk_child_exit:
> Pacemaker child process cib no longer wishes to be respawned. Shutting
> ourselves down.
> Jul 26 16:26:02 leftnode crmd[17383]:     info: do_cib_control:
> Disconnecting CIB
> Jul 26 16:26:02 leftnode crmd[17383]:     info: do_exit: Performing
> A_EXIT_0 - gracefully exiting the CRMd
> Jul 26 16:26:02 leftnode crmd[17383]:     info: free_mem: Dropping
> I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> Jul 26 16:26:02 leftnode crmd[17383]:     info: crm_xml_cleanup:
> Cleaning up memory from libxml2
> Jul 26 16:26:02 leftnode crmd[17383]:     info: do_exit: [crmd] stopped (0)
> Jul 26 16:26:02 leftnode pacemakerd[17377]:     info: pcmk_child_exit:
> Child process crmd exited (pid=17383, rc=0)
> Jul 26 16:26:02 leftnode pacemakerd[17377]:   notice: stop_child:
> Stopping pengine: Sent -15 to process 17382
> Jul 26 16:26:02 leftnode pacemakerd[17377]:     info: pcmk_child_exit:
> Child process pengine exited (pid=17382, rc=0)
> Jul 26 16:26:02 leftnode pacemakerd[17377]:   notice: stop_child:
> Stopping lrmd: Sent -15 to process 17380
> Jul 26 16:26:02 leftnode lrmd: [17380]: info: lrmd is shutting down
> Jul 26 16:26:02 leftnode pacemakerd[17377]:     info: pcmk_child_exit:
> Child process lrmd exited (pid=17380, rc=0)
> Jul 26 16:26:02 leftnode pacemakerd[17377]:   notice: stop_child:
> Stopping stonith-ng: Sent -15 to process 17379
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org