[Pacemaker] setup advice

Wed Jul 3 13:05:34 UTC 2013

03.07.2013, 16:26, "Takatoshi MATSUO" <matsuo.tak at gmail.com>:
> Hi Andrey
>
> 2013/7/3 Andrey Groshev <greenx at yandex.ru>:
>
>>  03.07.2013, 06:43, "Takatoshi MATSUO" <matsuo.tak at gmail.com>:
>>>  Hi Stefano
>>>
>>>  2013/7/2 Stefano Sasso <stesasso at gmail.com>:
>>>>   Hello folks,
>>>>     I have the following setup in mind, but I need some advice and one hint on
>>>>   how to realize a particular function.
>>>>
>>>>   I have a N (>= 2) nodes cluster, with data storage on postgresql.
>>>>   I would like to manage postgres master-slave replication in this way: one
>>>>   node is the "master", one is the "slave", and the others are "standby"
>>>>   nodes.
>>>>   If the master fails, the slave becomes the master, and one of the standby
>>>>   becomes the slave.
>>>>   If the slave fails, one of the standby becomes the new slave.
>>>  Does "standby" mean that PostgreSQL is stopped ?
>>>  If Master doesn't have WAL files which new slave needs,
>>>  new slave can't connect master.
>>>
>>>  How do you solve it ?
>>>  copy data or wal-archive on start automatically ?
>>>  It may cause timed-out if PostgreSQL has large database.
>>>>   If one of the "standby" fails, no problem :)
>>>>   I can correctly manage this configuration with ms and a custom script (using
>>>>   ocf:pacemaker:Stateful as example). If the cluster is already operational,
>>>>   the failover works fine.
>>>>
>>>>   My problem is about cluster start-up: in fact, only the previous running
>>>>   master and slave own the most updated data; so I would like that the new
>>>>   master should be the "old master" (or, even, the old slave), and the new
>>>>   slave should be the "old slave" (but this one is not mandatory). The
>>>>   important thing is that the new master should have up-to-date data.
>>>>   This should happen even if the servers are booted up with some minutes of
>>>>   delay between them. (users are very stupid sometimes).
>>>  Latest pgsql RA embraces these ideas to manage replication.
>>>
>>>   1. First boot
>>>  RA compares data and promotes PostgreSQL which has latest data.
>>>  The number of comparison can be changed  using xlog_check_count parameter.
>>>  If monitor interval is 10 sec and xlog_check_count is 360, RA can wait
>>>  1 hour to promote :)
>>  But in this case, when master dies, election a new master will continue one hour too.
>>  Is that right?
>
> No, if slave's data is up to date, master changes slave's master-score.
> So pacemaker stops master and promote slave immediately when master dies.
>

Wait.... in function have_master_right.

....snip....
    # get xlog locations of all nodes
    for node in ${NODE_LIST}; do
        output=`$CRM_ATTR_REBOOT -N "$node" -n \
                "$PGSQL_XLOG_LOC_NAME" -G -q 2>/dev/null`
....snip....
    if [ "$new" -ge "$OCF_RESKEY_xlog_check_count" ]; then
        newestXlog=`printf "$newfile\n" | sort -t " " -k 2,3 -r | \
                    head -1 | cut -d " " -f 2`
        if [ "$newestXlog" = "$mylocation" ]; then
            ocf_log info "I have a master right."
            $CRM_MASTER -v $PROMOTE_ME
            return 0
        fi
        change_data_status "$NODENAME" "DISCONNECT"
        ocf_log info "I don't have correct master data."
        # reset counter
        rm -f ${XLOG_NOTE_FILE}.*
        printf "$newfile\n" > ${XLOG_NOTE_FILE}.0
    fi

    return 1
}

As I understand, check xlog on all nodes $OCF_RESKEY_xlog_check_count more times.
And call this function from pgsql_replication_monitor - and she has in turn from pgsql_monitoring.
That is, while "monitoring" will not be called again $OCF_RESKEY_xlog_check_count have_master..... not return true.
I remember the entire structure of your code in memory :)
Or am I wrong?

>>>  2. Second boot
>>>  Master manages slave's data using attribute with "-l forever" option.
>>>  So RA can't start PostgreSQL, if the node has no latest data.
>>>>   My idea is the following:
>>>>   the MS resource is not started when the cluster comes up, but on startup
>>>>   there will only be one "arbitrator" resource (started on only one node).
>>>>   This resource reads from somewhere which was the previous master and the
>>>>   previous slave, and it wait up to 5 minutes to see if one of them comes up.
>>>>   In positive case, it forces the MS master resource to be run on that node
>>>>   (and start it); in negative case, if the wait timer expired, it start the
>>>>   master resource on a random node.
>>>>
>>>>   Is that possible? How can avoid a single resource to start on cluster boot?
>>>>   Or, could you advise another way to do this setup?
>>>>
>>>>   I hope I was clear, my english is not so good :)
>>>>   thank you so much,
>>>>      stefano
>>>>
>>>>   --
>>>>   Stefano Sasso
>>>>   http://stefano.dscnet.org/
>>>  Regards,
>>>  Takatoshi MATSUO
>>>
>>>  _______________________________________________
>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>  _______________________________________________
>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org