[Pacemaker] ocf:heartbeat:anything issue with samba

Wed Jan 16 08:14:54 EST 2013

Hi,

On Wed, Jan 16, 2013 at 11:24:35AM +0100, joël LABBY wrote:
> Hi !
> 
> I'm trying to install for the first time a 2 nodes master/slave
> cluster with drbd.
> 
> I've installed ipaddr2; apache; drbd ressources on pacemaker without
> any issue, but i've got a problem with smbd.
> 
> The smbd process starts well and the shared folder is accessible
> through the network, but crm_mon reports me an error :
> 
> samba_start_0 (node=ccda_master, call=2028, rc=1, status=complete):
> unknown error
> 
> in the log, I can see cycling samba start/stop
> 
> Jan 16 10:53:28 ccda_master pengine[3941]:  warning: unpack_rsc_op:
> Processing failed op samba_last_failure_0 on ccda_master: unknown
> error (1)
> Jan 16 10:53:28 ccda_master pengine[3941]:   notice: LogActions:
> Recover samba#011(Started ccda_master)
> Jan 16 10:53:28 ccda_master crmd[3942]:   notice:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origile_response ]
> Jan 16 10:53:28 ccda_master crmd[3942]:     info: do_te_invoke:
> Processing graph 505 (ref=pe_calc-dc-1358330008-1565) derived from
> /var/lib/pengine/pe-input-1274.bz2
> Jan 16 10:53:28 ccda_master crmd[3942]:     info: te_rsc_command:
> Initiating action 4: stop samba_stop_0 on ccda_master (local)
> Jan 16 10:53:28 ccda_master lrmd: [3939]: info: rsc:samba:1031: stop
> Jan 16 10:53:28 ccda_master lrmd: [3939]: info: RA output:
> (samba:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/anything:
> line 60: kill: (28159) - No such process
> Jan 16 10:53:28 ccda_master crmd[3942]:     info: process_lrm_event:
> LRM operation samba_stop_0 (call=1031, rc=0, cib-update=1566,
> confirmed=true) ok
> Jan 16 10:53:28 ccda_master crmd[3942]:     info: te_rsc_command:
> Initiating action 47: start samba_start_0 on ccda_master (local)
> Jan 16 10:53:28 ccda_master lrmd: [3939]: info: rsc:samba:1032: start
> Jan 16 10:53:28 ccda_master pengine[3941]:   notice:
> process_pe_message: Transition 505: PEngine Input stored in:
> /var/lib/pengine/pe-input-1274.bz2
> 
> the process are running :
> 
> pf -ef | grep smb
> root      5338     1  0 10:35 ?        00:00:00 /usr/sbin/smbd
> root      5340  5338  0 10:35 ?        00:00:00 /usr/sbin/smbd
> root     20427 20424  0 11:12 ?        00:00:00 su - root -c cd ;
> nohup /usr/sbin/smbd & echo $!
> root     20464   897  0 11:12 pts/0    00:00:00 grep --color=auto smb
> 
> pf -ef | grep smb
> root      5338     1  0 10:35 ?        00:00:00 /usr/sbin/smbd
> root      5340  5338  0 10:35 ?        00:00:00 /usr/sbin/smbd
> root     28190 28187  0 11:18 ?        00:00:00 su - root -c cd ;
> nohup /usr/sbin/smbd & echo $!
> root     28208   897  0 11:18 pts/0    00:00:00 grep --color=auto smb
> 
> we see that the /usr/sbin/smbd are fixed and the
> /usr/lib/ocf/resource.d//heartbeat/anything start always retry.
> Seems that the process /usr/sbin/smbd is not detected well after the
> first start.

Looks like ocf:heartbeat:anything is not well suited for programs
which do fork-exec, i.e. normal daemons. If smbd has an option to
stay in the foreground, that could help. Otherwise, you'll need
to use another agent or write a small dummy wrapper script.

Thanks,

Dejan

> Here is my configuration :
> 
> node $id="16777226" ccda_master \
>         attributes standby="off"
> node $id="33554442" ccda_slave \
>         attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>         params ip="192.9.200.140" cidr_netmask="32" \
>         op monitor interval="30s"
> primitive apacheServer ocf:heartbeat:apache \
>         params httpd="/usr/sbin/httpd"
> configfile="/etc/httpd/conf/httpd.conf" \
>         op monitor interval="1min"
> primitive fs_r0 ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/CCDA" fstype="ext4"
> primitive r0 ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="10" role="Master" \
>         op monitor interval="30" role="Slave"
> primitive samba ocf:heartbeat:anything \
>         params binfile="/usr/sbin/smbd"
> ms ms_r0 r0 \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation fs_ms_r0 inf: fs_r0 ms_r0:Master
> colocation samba_ms_r0 inf: samba ms_r0:Master
> colocation webserver inf: ClusterIP apacheServer
> order fs_after_ms_r0 inf: ms_r0:promote fs_r0:start
> order samba_after_ms_r0 inf: ms_r0:promote samba:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>         cluster-infrastructure="corosync" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1358271489"
> 
> 
> has anybody an idea on this issue.. I've googgled all the day on
> this without any response...
> 
> thanks
> 
> joe
> 
> Linux ccda_master 3.6.11-1.fc17.x86_64
> pacemaker-1.1.7-2.fc17.x86_64
> corosync-2.0.3-1.fc17.x86_64
> drbd-8.3.13-1.fc17.x86_64
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org