[Pacemaker] Problem with failover/failback under Ubuntu 10.04 for Active/Passive OpenNMS

Mon Jul 5 15:40:39 UTC 2010

What mail client are you using Dan?
Its breaking threading pretty bad if even gmail can't figure it out.

On Mon, Jul 5, 2010 at 5:34 PM, Dan Frincu <dfrincu at streamwide.ro> wrote:
>> Hi,
>>
>> First you might want to look at the following error, see if the module
>> is available on both servers.
>>
>> (fs-opennms-config:start:stderr) FATAL: Module scsi_hostadapter not
> found.
>>
>> Then try to run the resource manually:
>> - go to /usr/lib/ocf/resource.d/heartbeat
>> - export OCF_ROOT=/usr/lib/ocf
>> - export OCF_RESKEY_device="/dev/drbd/by-res/config"
>> - export OCF_RESKEY_options=rw
>> - export OCF_RESKEY_fstype=xfs
>> - export OCF_RESKEY_directory="/etc/opennms"
>> - ./Filesystem start
>>
>> See if you encounter any errors here. Run the steps on both servers.
>> Make sure to move the drbd resource from server to server so that the
>> mount works. You do that via
>> - go to server where drbd device is currently mounted and in a primary
>> state
>> - umount /etc/opennms
>> - drbdadm secondary config
>> - move to other server
>> - drbdadm primary config
>>
>> Also, make sure that pacemaker doesn't interfere with these operations
> :)
>>
>> Cheers.
>
> I get the error message about the scsi_hostadapter on both nodes
> but I can mount the DRBD Device just fine.
>
> ______________________________________________________________________
>
>>  monitoring-node-01 lrmd: [994]: info: RA output:
>> (fs-opennms-config:start:stderr) /dev/drbd/by-res/config: Wrong medium
>> type
>>  monitoring-node-01 lrmd: [994]: info: RA output:
>> (fs-opennms-config:start:stderr) mount: block device /dev/drbd0 is
>> write-protected, mounting read-only
>>  monitoring-node-01 lrmd: [994]: info: RA output:
>> (fs-opennms-config:start:stderr) mount: Wrong medium type
>>  monitoring-node-01 Filesystem[2464]: ERROR: Couldn't mount filesystem
>> /dev/drbd/by-res/config on /etc/opennms
>
> The errors from the log file are DRBD specific, they occur when you're
> trying to mount a resource in a Secondary state.
> Increase the "op start interval" for both the DRBD and Filesystem primitives
> to ~15 seconds. Having configured a start
> interval of 0 (zero) seconds, the change of DRBD resource from Primary to
> Secondary on node2 and then promotion to
> Primary on node1 is not instantaneous, therefore Pacemaker attempts to mount
> the filesystem without having the DRBD
> resource in a Primary state, it goes into that huuuge 300 second timeout,
> but as it waits for one resource (DRBD) to
> timeout, it executes the next one, which is the mount, which fails, with the
> given errors, for the aforementioned reasons.
>
> I'd also suggest adding an "op monitor" for each resource, with a reasonable
> interval and timeout, and also a mail alert.
>
> Regards,
> Dan
>
> --
> Dan FRINCU
> Systems Engineer
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>