[Pacemaker] Extracting resource state information from the XML

Thu Aug 11 13:08:13 UTC 2011

On 11/08/11 21:51, pskrap wrote:
>
> Hi,
>
> I have a setup with tens of resources over several nodes. The interface that is
> used to administer the system has a page showing all resources, their state and
> which node they are running on.
>
> I can get the information of one resource using 'crm_resource -W -r<rsc>' but
> running this command over and over again for that many resources is far to slow
> for my needs. The crm_mon produced web page is not enough as I need it in a
> customized format. I figured the best way to do this efficiently is to query the
> XML using cibadmin -Q, parse it and get the state of all resources from there in
> one go.
>
> Unfortunately I am not familiar with the status part of the XML. Is anyone able
> to tell me how i can find the following information in the XML:
>
> - resource state (running, stopped, failed)
> - which node the resource is currently running on

You probably want to read "Chapter 12. Status - Here be dragons" of 
Pacemaker Explained:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-status.html

In particular, the Complex Resource History Example:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch12s03s02.html

Very roughly speaking, for each node_state, you have to look at each 
lrm_resource_op for each lrm_resource, and based on the specific op 
(start, stop, monitor, promote, demote, etc.) and its return code, you 
determine the state of the resource on that node.  e.g.: if the last op 
was a successful (rc=0) start, or a successful monitor, the resource is 
running on that node.

If you're in a hurry, you might find it less painful to parse the output 
of something like "crm_mon -o -1" or "crm_mon -n -1".

Or, if you'd like to examine some hairy Ruby code for interpreting the 
CIB status section, have a look at:

http://hg.clusterlabs.org/pacemaker/hawk/file/tip/hawk/app/models/cib.rb#l300

Note though that this looks at all the ops, to record a list of what's 
failed (it's a loose transliteration of Pacemaker's C code that does the 
same thing).  If you only care about state, you probably only care about 
the *last* op.

I should also take the opportunity to plug Hawk, if you need a web based 
thing for managing Pacemaker clusters:

http://www.clusterlabs.org/wiki/Hawk

HTH,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong at suse.com