[Pacemaker] pacemaker unable to start

Wed Oct 21 11:49:42 EDT 2009

I recommend using corosync 1.1.1 - several bug fixes one critical for
proper pacemaker operation.  It won't fix this particular problem
however.

Corosync loads pacemaker by searching for a pacemaker lcrso file.  These
files are default installed in /usr/libexec/lcrso but may be in a
different location depending on your distribution.

Regards
-steve

On Wed, 2009-10-21 at 11:13 -0400, Shravan Mishra wrote:
> Hello guys,
> 
> We are running 
> 
> corosync-1.0.0
> heartbeat-2.99.1
> pacemaker-1.0.4
> 
> the corosync.conf  under /etc/corosync/ is 
> 
> ============
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
> 
> aisexec {
>        user: root
>        group: root
> }
> totem {
>        version: 2
>        secauth: off
>        threads: 0
>        interface {
>                ringnumber: 0
>                bindnetaddr: 172.30.0.0
>                mcastaddr:226.94.1.1
>                mcastport: 5406
>        }
> }
> 
> logging {
>        fileline: off
>        to_stderr: yes
>        to_logfile: yes
>        to_syslog: yes
>        logfile: /tmp/corosync.log
>        debug: on
>        timestamp: on
>        logger_subsys {
>                subsys: pacemaker
>                debug: on
>                tags: enter|leave|trace1|trace2| trace3|trace4|trace6
>        }
> }
> 
> 
> service {
>        name: pacemaker
>        ver: 0
>     #   use_mgmtd: yes
>      #  use_logd:yes
> }
> 
> 
> corosync {
>        user: root
>        group: root
> }
> 
> 
> amf {
>        mode: disabled
> }
> ============
> 
> 
> #service corosync start           
> 
> starts the messaging but fails to load pacemaker,
> 
> /tmp/corosync.log  ---   
> 
> ==================
> 
> Oct 21 11:05:43 corosync [MAIN  ] Corosync Cluster Engine ('trunk'):
> started and ready to provide service.
> Oct 21 11:05:43 corosync [MAIN  ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'.
> Oct 21 11:05:43 corosync [TOTEM ] Token Timeout (1000 ms) retransmit
> timeout (238 ms)
> Oct 21 11:05:43 corosync [TOTEM ] token hold (180 ms) retransmits
> before loss (4 retrans)
> Oct 21 11:05:43 corosync [TOTEM ] join (50 ms) send_join (0 ms)
> consensus (800 ms) merge (200 ms)
> Oct 21 11:05:43 corosync [TOTEM ] downcheck (1000 ms) fail to recv
> const (50 msgs)
> Oct 21 11:05:43 corosync [TOTEM ] seqno unchanged const (30 rotations)
> Maximum network MTU 1500
> Oct 21 11:05:43 corosync [TOTEM ] window size per rotation (50
> messages) maximum messages per rotation (17 messages)
> Oct 21 11:05:43 corosync [TOTEM ] send threads (0 threads)
> Oct 21 11:05:43 corosync [TOTEM ] RRP token expired timeout (238 ms)
> Oct 21 11:05:43 corosync [TOTEM ] RRP token problem counter (2000 ms)
> Oct 21 11:05:43 corosync [TOTEM ] RRP threshold (10 problem count)
> Oct 21 11:05:43 corosync [TOTEM ] RRP mode set to none.
> Oct 21 11:05:43 corosync [TOTEM ] heartbeat_failures_allowed (0)
> Oct 21 11:05:43 corosync [TOTEM ] max_network_delay (50 ms)
> Oct 21 11:05:43 corosync [TOTEM ] HeartBeat is Disabled. To enable set
> heartbeat_failures_allowed > 0
> Oct 21 11:05:43 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Oct 21 11:05:43 corosync [TOTEM ] Receive multicast socket recv buffer
> size (262142 bytes).
> Oct 21 11:05:43 corosync [TOTEM ] Transmit multicast socket send
> buffer size (262142 bytes).
> Oct 21 11:05:43 corosync [TOTEM ] The network interface [172.30.0.145]
> is now up.
> Oct 21 11:05:43 corosync [TOTEM ] Created or loaded sequence id
> 184.172.30.0.145 for this ring.
> Oct 21 11:05:43 corosync [TOTEM ] entering GATHER state from 15.
> Oct 21 11:05:43 corosync [SERV  ] Service failed to load 'pacemaker'.
> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
> extended virtual synchrony service'
> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
> configuration service'
> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
> cluster closed process group service v1.01'
> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
> cluster config database access v1.01'
> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
> profile loading service'
> Oct 21 11:05:43 corosync [MAIN  ] Compatibility mode set to
> whitetank.  Using V1 and V2 of the synchronization engine.
> Oct 21 11:05:43 corosync [TOTEM ] Creating commit token because I am
> the rep.
> Oct 21 11:05:43 corosync [TOTEM ] Saving state aru 0 high seq received
> 0
> Oct 21 11:05:43 corosync [TOTEM ] Storing new sequence id for ring bc
> Oct 21 11:05:43 corosync [TOTEM ] entering COMMIT state.
> Oct 21 11:05:43 corosync [TOTEM ] got commit token
> Oct 21 11:05:43 corosync [TOTEM ] entering RECOVERY state.
> Oct 21 11:05:43 corosync [TOTEM ] position [0] member 172.30.0.145:
> Oct 21 11:05:43 corosync [TOTEM ] previous ring seq 184 rep
> 172.30.0.145
> Oct 21 11:05:43 corosync [TOTEM ] aru 0 high delivered 0 received flag
> 1
> Oct 21 11:05:43 corosync [TOTEM ] Did not need to originate any
> messages in recovery.
> Oct 21 11:05:43 corosync [TOTEM ] got commit token
> Oct 21 11:05:43 corosync [TOTEM ] Sending initial ORF token
> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
> retrans flag0 retrans queue empty 1 count 0, aru 0
> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
> received 0
> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
> retrans flag0 retrans queue empty 1 count 1, aru 0
> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
> received 0
> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
> retrans flag0 retrans queue empty 1 count 2, aru 0
> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
> received 0
> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
> retrans flag0 retrans queue empty 1 count 3, aru 0
> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
> received 0
> Oct 21 11:05:43 corosync [TOTEM ] retrans flag count 4 token aru 0
> install seq 0 aru 0 0
> Oct 21 11:05:43 corosync [TOTEM ] recovery to regular 1-0
> Oct 21 11:05:43 corosync [TOTEM ] Delivering to app 1 to 0
> Oct 21 11:05:43 corosync [SYNC  ] This node is within the primary
> component and will provide service.
> Oct 21 11:05:43 corosync [TOTEM ] entering OPERATIONAL state.
> Oct 21 11:05:43 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Oct 21 11:05:43 corosync [TOTEM ] mcasted message added to pending
> queue
> Oct 21 11:05:43 corosync [TOTEM ] Delivering 0 to 1
> Oct 21 11:05:43 corosync [TOTEM ] Delivering MCAST message with seq 1
> to pending delivery queue
> Oct 21 11:05:43 corosync [SYNC  ] confchg entries 1
> Oct 21 11:05:43 corosync [SYNC  ] Barrier Start Received From
> -1862263124
> Oct 21 11:05:43 corosync [SYNC  ] Barrier completion status for nodeid
> -1862263124 = 1.
> ==================
> 
> 
> 
> 
> I'm curious to know how actually corosync/openais loads pacemaker, the
> config directive seems to have done the magic but apparently not in my
> case.
> What should I be looking for, as the log message hardly gives any
> information.
> 
> 
> Pacemaker comprises bunch of daemons like crmd, stonithd and stuff, I
> ran them individually to see any permission problems
> like /var/lib/heartbeat and /var/run/heartbeat which should be chown
> hacluster:haclient.
> 
> 
> 
> 
> Even after doing those it fails to load.
> 
> 
> 
> 
> Please advise me what should I do.
> 
> 
> 
> 
> Thanks
> Shravan
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker