[Pacemaker] setup advice

Wed Jul 3 09:59:45 EDT 2013

2013/7/3 Andrey Groshev <greenx at yandex.ru>:
>
>
> 03.07.2013, 16:26, "Takatoshi MATSUO" <matsuo.tak at gmail.com>:
>> Hi Andrey
>>
>> 2013/7/3 Andrey Groshev <greenx at yandex.ru>:
>>
>>>  03.07.2013, 06:43, "Takatoshi MATSUO" <matsuo.tak at gmail.com>:
>>>>  Hi Stefano
>>>>
>>>>  2013/7/2 Stefano Sasso <stesasso at gmail.com>:
>>>>>   Hello folks,
>>>>>     I have the following setup in mind, but I need some advice and one hint on
>>>>>   how to realize a particular function.
>>>>>
>>>>>   I have a N (>= 2) nodes cluster, with data storage on postgresql.
>>>>>   I would like to manage postgres master-slave replication in this way: one
>>>>>   node is the "master", one is the "slave", and the others are "standby"
>>>>>   nodes.
>>>>>   If the master fails, the slave becomes the master, and one of the standby
>>>>>   becomes the slave.
>>>>>   If the slave fails, one of the standby becomes the new slave.
>>>>  Does "standby" mean that PostgreSQL is stopped ?
>>>>  If Master doesn't have WAL files which new slave needs,
>>>>  new slave can't connect master.
>>>>
>>>>  How do you solve it ?
>>>>  copy data or wal-archive on start automatically ?
>>>>  It may cause timed-out if PostgreSQL has large database.
>>>>>   If one of the "standby" fails, no problem :)
>>>>>   I can correctly manage this configuration with ms and a custom script (using
>>>>>   ocf:pacemaker:Stateful as example). If the cluster is already operational,
>>>>>   the failover works fine.
>>>>>
>>>>>   My problem is about cluster start-up: in fact, only the previous running
>>>>>   master and slave own the most updated data; so I would like that the new
>>>>>   master should be the "old master" (or, even, the old slave), and the new
>>>>>   slave should be the "old slave" (but this one is not mandatory). The
>>>>>   important thing is that the new master should have up-to-date data.
>>>>>   This should happen even if the servers are booted up with some minutes of
>>>>>   delay between them. (users are very stupid sometimes).
>>>>  Latest pgsql RA embraces these ideas to manage replication.
>>>>
>>>>   1. First boot
>>>>  RA compares data and promotes PostgreSQL which has latest data.
>>>>  The number of comparison can be changed  using xlog_check_count parameter.
>>>>  If monitor interval is 10 sec and xlog_check_count is 360, RA can wait
>>>>  1 hour to promote :)
>>>  But in this case, when master dies, election a new master will continue one hour too.
>>>  Is that right?
>>
>> No, if slave's data is up to date, master changes slave's master-score.
>> So pacemaker stops master and promote slave immediately when master dies.
>>
>
> Wait.... in function have_master_right.
>
> ....snip....
>     # get xlog locations of all nodes
>     for node in ${NODE_LIST}; do
>         output=`$CRM_ATTR_REBOOT -N "$node" -n \
>                 "$PGSQL_XLOG_LOC_NAME" -G -q 2>/dev/null`
> ....snip....
>     if [ "$new" -ge "$OCF_RESKEY_xlog_check_count" ]; then
>         newestXlog=`printf "$newfile\n" | sort -t " " -k 2,3 -r | \
>                     head -1 | cut -d " " -f 2`
>         if [ "$newestXlog" = "$mylocation" ]; then
>             ocf_log info "I have a master right."
>             $CRM_MASTER -v $PROMOTE_ME
>             return 0
>         fi
>         change_data_status "$NODENAME" "DISCONNECT"
>         ocf_log info "I don't have correct master data."
>         # reset counter
>         rm -f ${XLOG_NOTE_FILE}.*
>         printf "$newfile\n" > ${XLOG_NOTE_FILE}.0
>     fi
>
>     return 1
> }
>
> As I understand, check xlog on all nodes $OCF_RESKEY_xlog_check_count more times.
> And call this function from pgsql_replication_monitor - and she has in turn from pgsql_monitoring.
> That is, while "monitoring" will not be called again $OCF_RESKEY_xlog_check_count have_master..... not return true.
> I remember the entire structure of your code in memory :)
> Or am I wrong?

have_master_right() doesn't change master score.
So PostgreSQL is promoted immediately if slave has master-score > 0
regardress of return-code of  have_master_right().

Note that it makes an exception when using rep_mode=async and the
number of nodes >= 3
because RA can not know which node should be promoted.

control_slave_status()
------------------------------------------------------------------
                    if [ $number_of_nodes -le 2 ]; then
                        change_master_score "$target" "$CAN_PROMOTE"
                    else
                        # I can't determine which slave's data is
newest in async mode.
                        change_master_score "$target" "$CAN_NOT_PROMOTE"
                    fi
------------------------------------------------------------------

>
>
>>>>  2. Second boot
>>>>  Master manages slave's data using attribute with "-l forever" option.
>>>>  So RA can't start PostgreSQL, if the node has no latest data.
>>>>>   My idea is the following:
>>>>>   the MS resource is not started when the cluster comes up, but on startup
>>>>>   there will only be one "arbitrator" resource (started on only one node).
>>>>>   This resource reads from somewhere which was the previous master and the
>>>>>   previous slave, and it wait up to 5 minutes to see if one of them comes up.
>>>>>   In positive case, it forces the MS master resource to be run on that node
>>>>>   (and start it); in negative case, if the wait timer expired, it start the
>>>>>   master resource on a random node.
>>>>>
>>>>>   Is that possible? How can avoid a single resource to start on cluster boot?
>>>>>   Or, could you advise another way to do this setup?
>>>>>
>>>>>   I hope I was clear, my english is not so good :)
>>>>>   thank you so much,
>>>>>      stefano
>>>>>
>>>>>   --
>>>>>   Stefano Sasso
>>>>>   http://stefano.dscnet.org/
>>>>  Regards,
>>>>  Takatoshi MATSUO