[Pacemaker] Error: cluster is not currently running on this node

Wed Aug 20 14:22:39 UTC 2014

ok, will do that. This will not affect sip2?

sorry for my noob question but I must be careful as this is in production ;)

So, "fence_bladecenter_snmp reboot" right?

br
miha

Dne 8/19/2014 11:53 AM, piše emmanuel segura:
> sorry,
>
> That was a typo, fixed "try to poweroff sp1 by hand, using the
> fence_bladecenter_snmp in your shell"
>
> 2014-08-19 11:17 GMT+02:00 Miha <miha at softnet.si>:
>> hi,
>>
>> what do you mean by "by had of powweroff sp1"? do power off server sip1?
>>
>> One thing also bothers me. Why on sip2 cluster service is not running if
>> still virual ip and etc are all properly running?
>>
>> tnx
>> miha
>>
>>
>> Dne 8/19/2014 9:08 AM, piše emmanuel segura:
>>
>>> Your config look ok, have you tried to use fence_bladecenter_snmp by
>>> had for poweroff sp1?
>>>
>>> http://www.linuxcertif.com/man/8/fence_bladecenter_snmp/
>>>
>>> 2014-08-19 8:05 GMT+02:00 Miha <miha at softnet.si>:
>>>> sorry, here is it:
>>>>
>>>> <cluster config_version="9" name="sipproxy">
>>>>     <fence_daemon/>
>>>>     <clusternodes>
>>>>       <clusternode name="sip1" nodeid="1">
>>>>         <fence>
>>>>           <method name="pcmk-method">
>>>>             <device name="pcmk-redirect" port="sip1"/>
>>>>           </method>
>>>>         </fence>
>>>>       </clusternode>
>>>>       <clusternode name="sip2" nodeid="2">
>>>>         <fence>
>>>>           <method name="pcmk-method">
>>>>             <device name="pcmk-redirect" port="sip2"/>
>>>>           </method>
>>>>         </fence>
>>>>       </clusternode>
>>>>     </clusternodes>
>>>>     <cman expected_votes="1" two_node="1"/>
>>>>     <fencedevices>
>>>>       <fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
>>>>     </fencedevices>
>>>>     <rm>
>>>>       <failoverdomains/>
>>>>       <resources/>
>>>>     </rm>
>>>> </cluster>
>>>>
>>>>
>>>> br
>>>> miha
>>>>
>>>> Dne 8/18/2014 11:33 AM, piše emmanuel segura:
>>>>> your cman /etc/cluster/cluster.conf ?
>>>>>
>>>>> 2014-08-18 7:08 GMT+02:00 Miha <miha at softnet.si>:
>>>>>> Hi Emmanuel,
>>>>>>
>>>>>> this is my config:
>>>>>>
>>>>>>
>>>>>> Pacemaker Nodes:
>>>>>>     sip1 sip2
>>>>>>
>>>>>> Resources:
>>>>>>     Master: ms_drbd_mysql
>>>>>>      Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>>>>> clone-node-max=1
>>>>>> notify=true
>>>>>>      Resource: p_drbd_mysql (class=ocf provider=linbit type=drbd)
>>>>>>       Attributes: drbd_resource=clusterdb_res
>>>>>>       Operations: monitor interval=29s role=Master
>>>>>> (p_drbd_mysql-monitor-29s)
>>>>>>                   monitor interval=31s role=Slave
>>>>>> (p_drbd_mysql-monitor-31s)
>>>>>>     Group: g_mysql
>>>>>>      Resource: p_fs_mysql (class=ocf provider=heartbeat type=Filesystem)
>>>>>>       Attributes: device=/dev/drbd0 directory=/var/lib/mysql_drbd
>>>>>> fstype=ext4
>>>>>>       Meta Attrs: target-role=Started
>>>>>>      Resource: p_ip_mysql (class=ocf provider=heartbeat type=IPaddr2)
>>>>>>       Attributes: ip=XXX.XXX.XXX.XXX cidr_netmask=24 nic=eth2
>>>>>>      Resource: p_mysql (class=ocf provider=heartbeat type=mysql)
>>>>>>       Attributes: datadir=/var/lib/mysql_drbd/data/ user=root group=root
>>>>>> config=/var/lib/mysql_drbd/my.cnf pid=/var/run/mysqld/mysqld.pid
>>>>>> socket=/var/lib/mysql/mysql.sock binary=/usr/bin/mysqld_safe
>>>>>> additional_parameters="--bind-address=212.13.249.55 --user=root"
>>>>>>       Meta Attrs: target-role=Started
>>>>>>       Operations: start interval=0 timeout=120s (p_mysql-start-0)
>>>>>>                   stop interval=0 timeout=120s (p_mysql-stop-0)
>>>>>>                   monitor interval=20s timeout=30s (p_mysql-monitor-20s)
>>>>>>     Clone: cl_ping
>>>>>>      Meta Attrs: interleave=true
>>>>>>      Resource: p_ping (class=ocf provider=pacemaker type=ping)
>>>>>>       Attributes: name=ping multiplier=1000 host_list=XXX.XXX.XXX.XXXX
>>>>>>       Operations: monitor interval=15s timeout=60s (p_ping-monitor-15s)
>>>>>>                   start interval=0s timeout=60s (p_ping-start-0s)
>>>>>>                   stop interval=0s (p_ping-stop-0s)
>>>>>>     Resource: opensips (class=lsb type=opensips)
>>>>>>      Meta Attrs: target-role=Started
>>>>>>      Operations: start interval=0 timeout=120 (opensips-start-0)
>>>>>>                  stop interval=0 timeout=120 (opensips-stop-0)
>>>>>>
>>>>>> Stonith Devices:
>>>>>>     Resource: fence_sip1 (class=stonith type=fence_bladecenter_snmp)
>>>>>>      Attributes: action=off ipaddr=172.30.0.2 port=8 community=test
>>>>>> login=snmp8
>>>>>> passwd=soft1234
>>>>>>      Meta Attrs: target-role=Started
>>>>>>     Resource: fence_sip2 (class=stonith type=fence_bladecenter_snmp)
>>>>>>      Attributes: action=off ipaddr=172.30.0.2 port=9 community=test1
>>>>>> login=snmp8 passwd=soft1234
>>>>>>      Meta Attrs: target-role=Started
>>>>>> Fencing Levels:
>>>>>>
>>>>>> Location Constraints:
>>>>>>      Resource: ms_drbd_mysql
>>>>>>        Constraint: l_drbd_master_on_ping
>>>>>>          Rule: score=-INFINITY role=Master boolean-op=or
>>>>>> (id:l_drbd_master_on_ping-rule)
>>>>>>            Expression: not_defined ping
>>>>>> (id:l_drbd_master_on_ping-expression)
>>>>>>            Expression: ping lte 0 type=number
>>>>>> (id:l_drbd_master_on_ping-expression-0)
>>>>>> Ordering Constraints:
>>>>>>      promote ms_drbd_mysql then start g_mysql (INFINITY)
>>>>>> (id:o_drbd_before_mysql)
>>>>>>      g_mysql then start opensips (INFINITY) (id:opensips_after_mysql)
>>>>>> Colocation Constraints:
>>>>>>      g_mysql with ms_drbd_mysql (INFINITY) (with-rsc-role:Master)
>>>>>> (id:c_mysql_on_drbd)
>>>>>>      opensips with g_mysql (INFINITY) (id:c_opensips_on_mysql)
>>>>>>
>>>>>> Cluster Properties:
>>>>>>     cluster-infrastructure: cman
>>>>>>     dc-version: 1.1.10-14.el6-368c726
>>>>>>     no-quorum-policy: ignore
>>>>>>     stonith-enabled: true
>>>>>> Node Attributes:
>>>>>>     sip1: standby=off
>>>>>>     sip2: standby=off
>>>>>>
>>>>>>
>>>>>> br
>>>>>> miha
>>>>>>
>>>>>> Dne 8/14/2014 3:05 PM, piše emmanuel segura:
>>>>>>
>>>>>>> ncomplete=10, Source=/var/lib/pacemaker/pengine/pe-warn-7.bz2):
>>>>>>> Stopped
>>>>>>> Jul 03 14:10:51 [2701] sip2       crmd:   notice:
>>>>>>> too_many_st_failures:         No devices found in cluster to fence
>>>>>>> sip1, giving up
>>>>>>>
>>>>>>> Jul 03 14:10:54 [2697] sip2 stonith-ng:     info: stonith_command:
>>>>>>>      Processed st_query reply from sip2: OK (0)
>>>>>>> Jul 03 14:10:54 [2697] sip2 stonith-ng:    error: remote_op_done:
>>>>>>>      Operation reboot of sip1 by sip2 for
>>>>>>> stonith_admin.cman.28299 at sip2.94474607: No such device
>>>>>>>
>>>>>>> Jul 03 14:10:54 [2697] sip2 stonith-ng:     info: stonith_command:
>>>>>>>      Processed st_notify reply from sip2: OK (0)
>>>>>>> Jul 03 14:10:54 [2701] sip2       crmd:   notice:
>>>>>>> tengine_stonith_notify:       Peer sip1 was not terminated (reboot) by
>>>>>>> sip2 for sip2: No such device
>>>>>>> (ref=94474607-8cd2-410d-bbf7-5bc7df614a50) by client
>>>>>>> stonith_admin.cman.28299
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>
>>>>>>> Sorry for the short answer, have you tested your cluster fencing ? can
>>>>>>> you show your cluster.conf xml?
>>>>>>>
>>>>>>> 2014-08-14 14:44 GMT+02:00 Miha <miha at softnet.si>:
>>>>>>>> emmanuel,
>>>>>>>>
>>>>>>>> tnx. But how to know why fancing stop working?
>>>>>>>>
>>>>>>>> br
>>>>>>>> miha
>>>>>>>>
>>>>>>>> Dne 8/14/2014 2:35 PM, piše emmanuel segura:
>>>>>>>>
>>>>>>>>> Node sip2: UNCLEAN (offline) is unclean because the cluster fencing
>>>>>>>>> failed to complete the operation
>>>>>>>>>
>>>>>>>>> 2014-08-14 14:13 GMT+02:00 Miha <miha at softnet.si>:
>>>>>>>>>> hi.
>>>>>>>>>>
>>>>>>>>>> another thing.
>>>>>>>>>>
>>>>>>>>>> On node I pcs is running:
>>>>>>>>>> [root at sip1 ~]# pcs status
>>>>>>>>>> Cluster name: sipproxy
>>>>>>>>>> Last updated: Thu Aug 14 14:13:37 2014
>>>>>>>>>> Last change: Sat Feb  1 20:10:48 2014 via crm_attribute on sip1
>>>>>>>>>> Stack: cman
>>>>>>>>>> Current DC: sip1 - partition with quorum
>>>>>>>>>> Version: 1.1.10-14.el6-368c726
>>>>>>>>>> 2 Nodes configured
>>>>>>>>>> 10 Resources configured
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Node sip2: UNCLEAN (offline)
>>>>>>>>>> Online: [ sip1 ]
>>>>>>>>>>
>>>>>>>>>> Full list of resources:
>>>>>>>>>>
>>>>>>>>>>       Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
>>>>>>>>>>           Masters: [ sip2 ]
>>>>>>>>>>           Slaves: [ sip1 ]
>>>>>>>>>>       Resource Group: g_mysql
>>>>>>>>>>           p_fs_mysql (ocf::heartbeat:Filesystem):    Started sip2
>>>>>>>>>>           p_ip_mysql (ocf::heartbeat:IPaddr2):       Started sip2
>>>>>>>>>>           p_mysql    (ocf::heartbeat:mysql): Started sip2
>>>>>>>>>>       Clone Set: cl_ping [p_ping]
>>>>>>>>>>           Started: [ sip1 sip2 ]
>>>>>>>>>>       opensips       (lsb:opensips): Stopped
>>>>>>>>>>       fence_sip1     (stonith:fence_bladecenter_snmp):       Started
>>>>>>>>>> sip2
>>>>>>>>>>       fence_sip2     (stonith:fence_bladecenter_snmp):       Started
>>>>>>>>>> sip2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [root at sip1 ~]#
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dne 8/14/2014 2:12 PM, piše Miha:
>>>>>>>>>>
>>>>>>>>>>> Hi emmanuel,
>>>>>>>>>>>
>>>>>>>>>>> i think so, what is the best way to check?
>>>>>>>>>>>
>>>>>>>>>>> Sorry for my noob question, I have confiured this 6 mouths ago and
>>>>>>>>>>> everything was working fine till now. Now I need to find out what
>>>>>>>>>>> realy
>>>>>>>>>>> heppend beffor I do something stupid.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> tnx
>>>>>>>>>>>
>>>>>>>>>>> Dne 8/14/2014 1:58 PM, piše emmanuel segura:
>>>>>>>>>>>> are you sure your cluster fencing is working?
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-08-14 13:40 GMT+02:00 Miha <miha at softnet.si>:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I noticed today that I am having some problem with cluster. I
>>>>>>>>>>>>> noticed
>>>>>>>>>>>>> the
>>>>>>>>>>>>> master server is offilne but still virutal ip is assigned to it
>>>>>>>>>>>>> and
>>>>>>>>>>>>> all
>>>>>>>>>>>>> services are running properly (for production).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If I do this I am getting this notifications:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root at sip2 cluster]# pcs status
>>>>>>>>>>>>> Error: cluster is not currently running on this node
>>>>>>>>>>>>> [root at sip2 cluster]# /etc/init.d/corosync status
>>>>>>>>>>>>> corosync dead but pid file exists
>>>>>>>>>>>>> [root at sip2 cluster]# pcs status
>>>>>>>>>>>>> Error: cluster is not currently running on this node
>>>>>>>>>>>>> [root at sip2 cluster]#
>>>>>>>>>>>>> [root at sip2 cluster]#
>>>>>>>>>>>>> [root at sip2 cluster]# tailf fenced.log
>>>>>>>>>>>>> Aug 14 13:34:25 fenced cman_get_cluster error -1 112
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The main thing is what to do now? Do "pcs start" and hope for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> best
>>>>>>>>>>>>> or
>>>>>>>>>>>>> what?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have pasted log in pastebin: http://pastebin.com/SUp2GcmN
>>>>>>>>>>>>>
>>>>>>>>>>>>> tnx!
>>>>>>>>>>>>>
>>>>>>>>>>>>> miha
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>
>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started:
>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>