[Pacemaker] Help on setting order of resources

Sat Aug 18 16:16:07 EDT 2012

Short description 
----------------------- 
Corosync ignores my resources order settings. 

Final goal
-----------
Being able to HA zimbra. 

Description of the system 
----------------------------------- 
This is an Ubuntu 10.04 LTS because current stable Zimbra works in Ubuntu 10.04 and not yet in 12.04. 

I've dist-upgraded packages from: https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa as it was advised on some sites. 

My main configuration is based on this document: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 

I've created some OCF resource agents (for zimbra and some network stuff) on my own and I've already tested them thanks to ocf-tester and ocf-tester-py (a hack of mine of ocf-tester that allows you to test python based ocf scripts). 

Finally some packages versions: 

libcrmcluster1 1.1.6-2ubuntu0~ppa2 
libcrmcommon2 1.1.6-2ubuntu0~ppa2 
corosync 1.4.2-1ubuntu0~ppa1 
libcorosync4 1.4.2-1ubuntu0~ppa1 
lvm2 2.02.54-1ubuntu4.1ppa5 
pacemaker 1.1.6-2ubuntu0~ppa2 
libglib2.0-0 2.24.1-0ubuntu1.1~ppa1 
lvm2 2.02.54-1ubuntu4.1ppa5 
cluster-glue 1.0.8-2ubuntu0~ppa4 
libcluster-glue 1.0.8-2ubuntu0~ppa4 
resource-agents 1:3.9.2-4ubuntu0~ppa2 

crm configure show output: 
----------------------------------- 

adrian at zhatest-01:~$ sudo crm configure show 
node zhatest-01.domain.com 
node zhatest-02.domain.com 
primitive ClusterDefaultRoute ocf:btactic:OVHdefaultroute \ 
op monitor interval="30s" 
primitive ClusterHostRoute ocf:btactic:OVHhostroute \ 
params device="eth0" \ 
op monitor interval="30s" 
primitive ClusterIP ocf:heartbeat:IPaddr2 \ 
params nic="eth0" ip="1.2.3.4" cidr_netmask="32" broadcast="1.2.3.4" \ 
op monitor interval="30s" 
primitive ClusterOVHFailover ocf:btactic:OVHfailover \ 
op monitor interval="120s" timeout="60s" \ 
op start interval="0" timeout="660" \ 
op stop interval="0" timeout="660" \ 
params nichandle="MYLOGIN" password="MYSECRET" failover="1.2.3.4" \ 
meta target-role="Started" 
primitive ZimbraData ocf:linbit:drbd \ 
params drbd_resource="zimbradata" \ 
op monitor interval="60s" role="Master" \ 
op monitor interval="50s" role="Slave" \ 
op start interval="0" role="Master" timeout="240" \ 
op start interval="0" role="Slave" timeout="240" \ 
op stop interval="0" role="Master" timeout="100" \ 
op stop interval="0" role="Slave" timeout="100" 
primitive ZimbraFS ocf:heartbeat:Filesystem \ 
params device="/dev/drbd/by-res/zimbradata" directory="/opt/zimbra" fstype="ext4" \ 
op start interval="0" timeout="60s" \ 
op stop interval="0" timeout="60s" 
primitive ZimbraServer ocf:btactic:zimbra \ 
op monitor interval="2min" \ 
op start interval="0" timeout="360s" \ 
op stop interval="0" timeout="360s" 
group MySystem ClusterOVHFailover ClusterIP ClusterHostRoute ClusterDefaultRoute 
group MyZimbra ZimbraFS ZimbraServer 
ms ZimbraDataClone ZimbraData \ 
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 
location prefer-zhatest-01 MyZimbra 50: zhatest-01.domain.com 
colocation everything-together inf: MySystem ZimbraDataClone:Master MyZimbra 
order everything-ordered inf: MySystem ZimbraDataClone:promote MyZimbra 
property $id="cib-bootstrap-options" \ 
no-quorum-policy="ignore" \ 
stonith-enabled="false" \ 
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
cluster-infrastructure="openais" \ 
expected-quorum-votes="2" 
rsc_defaults $id="rsc-options" \ 
resource-stickiness="100" 

crm_on -orVVVV1 output: 
---------------------------------- 
crm_mon[4215]: 2012/08/18_19:46:39 info: main: Starting crm_mon 
crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_config: Startup probes: enabled 
crm_mon[4215]: 2012/08/18_19:46:39 notice: unpack_config: On loss of CCM Quorum: Ignore 
crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 
crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_domains: Unpacking domains 
crm_mon[4215]: 2012/08/18_19:46:39 info: determine_online_status: Node zhatest-01.domain.com is online 
crm_mon[4215]: 2012/08/18_19:46:39 notice: unpack_rsc_op: Hard error - ZimbraServer_last_failure_0 failed with rc=5: Preventing ZimbraServer from re-starting on zhatest-01.domain.com 
============ 
Last updated: Sat Aug 18 19:46:39 2012 
Last change: Sat Aug 18 18:09:51 2012 via crmd on zhatest-01.domain.com 
Stack: openais 
Current DC: zhatest-01.domain.com - partition WITHOUT quorum 
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 
2 Nodes configured, 2 expected votes 
8 Resources configured. 
============ 

Online: [ zhatest-01.domain.com ] 
OFFLINE: [ zhatest-02.domain.com ] 

Full list of resources: 

Resource Group: MySystem 
ClusterOVHFailover (ocf::btactic:OVHfailover): Stopped 
ClusterIP (ocf::heartbeat:IPaddr2): Stopped 
ClusterHostRoute (ocf::btactic:OVHhostroute): Stopped 
ClusterDefaultRoute (ocf::btactic:OVHdefaultroute): Stopped 
Resource Group: MyZimbra 
ZimbraFS (ocf::heartbeat:Filesystem): Stopped 
ZimbraServer (ocf::btactic:zimbra): Stopped 
Master/Slave Set: ZimbraDataClone [ZimbraData] 
Slaves: [ zhatest-01.domain.com ] 
Stopped: [ ZimbraData:1 ] 

Operations: 
* Node zhatest-01.domain.com: 
ZimbraData:0: migration-threshold=1000000 
+ (9) start: rc=0 (ok) 
+ (11) monitor: interval=50000ms rc=0 (ok) 
ZimbraServer: migration-threshold=1000000 
+ (7) probe: rc=5 (not installed) 

Failed actions: 
ZimbraServer_monitor_0 (node=zhatest-01.domain.com, call=7, rc=5, status=complete): not installed 

Long description: 
----------------------- 
I expect that system tries to start resources in the following order: 
MySystem ZimbraDataClone:Master MyZimbra 
that after expanding group members is: 
ClusterOVHFailover ClusterIP ClusterHostRoute \ 
ClusterDefaultRoute ZimbraDataClone:Master \ 
ZimbraFS ZimbraServer 
. 

If crm_mon -o shows the operation history as per my former log it seems that corosync insists on starting ZimbraData on the first place and I don't want that. 

So, that's it. Am I missing something? If you need more logs don't hesitate to ask for them. 
Thank you! 

Other questions 
--------------------- 
Where is documented the probe operation which happens to appear on crm_mon output? 

P.S.: This unanswered email is very similar to my issue: http://lists.linux-ha.org/pipermail/linux-ha/2011-May/043144.html 

-- 

-- 
Adrián Gibanel 
I.T. Manager 

+34 675 683 301 
www.btactic.com 

Ens podeu seguir a/Nos podeis seguir en: 

i 

Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. 

AVIS: 
El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . 

AVISO: 
El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorización correspondiente. Si han recibido este mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje .