[Pacemaker] continue starting chain with failed group resources

Wed Dec 15 01:18:16 UTC 2010

Sent: Tue Dec 14 2010 11:37:06 GMT-0700 (Mountain Standard Time)
From: Dejan Muhamedagic <dejanmm at fastmail.fm>
To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] continue starting chain with failed group 
resources
> Hi,
>
> On Mon, Dec 13, 2010 at 10:43:36PM -0700, Patrick H. wrote:
>   
>> After tinkering with this for a few hours I finally have something working.
>>
>> colocation co-raid inf: ( md_raid iscsi_1 iscsi_2 iscsi_3 )
>>     
>
> This should be noop. You'd want something like this, I think:
>
> colocation co-raid inf: md_raid ( iscsi_1 iscsi_2 iscsi_3 )
>
>   
No, that makes the md_raid service depend on all the iscsi services 
being started, which I dont want

>> order or-raid 0: ( iscsi_1 iscsi_2 iscsi_3 ) md_raid
>>
>> Got rid of the group, changed the score on the order to 0, and
>> changed the grouping of both the colocation and order. This
>> *appears* to function as intended, but if anyone can point out any
>> pitfalls I'd appreciate it
>>
>> -Patrick
>>
>> Sent: Mon Dec 13 2010 21:12:04 GMT-0700 (Mountain Standard Time)
>> From: Patrick H. <pacemaker at feystorm.net>
>> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
>> Subject: [Pacemaker] continue starting chain with failed group resources
>>     
>>> Is there a way to continue down a chain of starting resources once
>>> a previous resource hast tried to start, no matter if the try was
>>> successful or not?
>>>       
>
> No, that's currently not possible to express. I think that you
> should take the iSCSI resources out of the cluster and let them
> start on boot _before_ the cluster manager. If there are not
> enough disks, then the md_raid resource is going to fail.
>   
Cant do that either. When the node that is currently using the iscsi 
services fails, they have to be migrated over to another host so it can 
assemble them into a raid array. If theyre not being managed by 
pacemaker, they wont migrate.

I made a few more tweaks from the configuration I posted earlier and it 
seems to work pretty good with only one exception.
colocation co-raid inf: ( md_raid iscsi_1 iscsi_2 iscsi_3 )
order or-raid_start 0: ( iscsi_1:start iscsi_2:start iscsi_3:start ) 
md_raid:start
order or-raid_stop inf: md_raid:stop ( iscsi_1:stop iscsi_2:stop 
iscsi_3:stop )

That makes it so that when they start up, they start in order, but it 
isnt required that every iscsi start before md_raid, just that they try 
to start
Then when they stop, its manditory that they stop in that order so that 
no iscsi service will stop while md_raid is still running.

The exception I mentioned is a bug in the policy engine. Bug 2435. The 
policy engine allows resources within a colocation set to start on other 
nodes. So if I were to stop one of the iscsi services, and then start it 
again, it might start on a different node. Unless this bug gets fixed 
soon, I'll probably modify the iscsi script so that all the iscsi 
devices are under 1 resource.
> Thanks,
>
> Dejan
>
>   
>>> I've got 3 iSCSI resources which are in a group, and then an md
>>> raid-5 array as another resource. I have the raid array resource
>>> set to start after the group with a colocation rule, but it will
>>> only start if the whole group comes up. Since this is raid-5, we
>>> can obviously handle some disk failure and start up anyway. So how
>>> do I get it to try to start it up once all the iSCSI resources
>>> have tried to start? Went looking through the docs and didnt find
>>> anything.
>>>
>>> Note: there will be other resources in the chain (like mounting
>>> the filesystem) that I dont want to try and start if the raid
>>> array resource didnt start.
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>       
>
>   
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>     
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101214/4adfb4b5/attachment-0002.htm>