[Pacemaker] PostgreSQL replicarion RA: PGSQL.lock
Andrew
nitr0 at seti.kr.ua
Thu Feb 14 08:38:10 UTC 2013
14.02.2013 10:03, Takatoshi MATSUO пишет:
> Hi
>
> 2013/2/13 Andrew <nitr0 at seti.kr.ua>:
>> 12.02.2013 02:35, Takatoshi MATSUO пишет:
>>
>>> Hi
>>>
>>> 2013/2/9 Andrew <nitr0 at seti.kr.ua>:
>>>> Hi all.
>>>> For what reason is implemented PGSQL.lock in RA, and what pbs may happen
>>>> if
>>>> it'll be removed from RA code?
>>> It may cause data inconsistency.
>>> If the file exists in a node, you need to copy data from new master.
>> I noticed that during master migration lock still remains and postgresql
>> isn't started on old master; demote also will fail with lock file. Also, if
>> cluster fails (for ex., power failure occurs), old master will not start,
>> and slave after startup will be promoted to master - it's OK when both nodes
>> are crashed simultaneously, and it's really bad when old slave was crashed
>> earlier. If postgres crashed/killed by OOM/etc - it also will not be
>> restarted...
> The existence of lock file dose not necessarily mean that data is inconsistent.
> RA can't know detail data status.
>
> If you know that data is valid, you can delete the lock file and clear
> failcount.
Really - RA can check last log replay, and choose behaviour (to start
old 'master' as master if it's log position is ahead 'old-slave' one, or
to fail/try to start as slave and fail if it isn't synced at timeout/to
force sync if it's log position is behind 'old-slave' one)
>> Maybe it'll be better to watch log files on slave that tries to sync with
>> master/to check slave timeline, and if slave can't sync with error that
>> timeline differs - to fail it with error (or even to sync with master with
>> pg_basebackup - it supports connection to remote server and works quick:
>> http://sharingtechknowledge.blogspot.com/2011/12/postgresql-pgbasebackup-forget-about.html
>> - example)?
>>
>>
>>>> Also, 2nd question: how I can prevent pgsql RA from promoting master
>>>> before
>>>> both nodes will brings up OR before timeout is reached (for ex., if 2nd
>>>> node
>>>> is dead)?
>>> You can use xlog_check_count parameter set up with a large number.
>>> RA retries comparing data with specified number of times in Slave.
>> Thanks; I'll try this.
>>
>>> Or you can use "target-role" such as below too.
>>> ----
>>> ms msPostgresql pgsql \
>>> meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Slave"
>>> ---
>> In that case, how can I choose on what node I should promote resource to
>> master (which has fresher WAL position) - I should do this manually, or I
>> can just run promote?
>>
> In master/slave configuration, RA decides which node can be promoted
> using master-score
> and Pacemaker promotes it considering "colocation", "order", "rule" and so on.
> So you can't promote it manually.
>
> But as far as pgsql RA goes, you can do it such as below
>
> 1. stop all pacemakers
> 2. clear all settings of pacemaker such as "rm
> /var/lib/heartbeat/crm/cib*" in both nodes.
> 3. start pacemaker in one server which should be Master.
> -> RA certainly increments master-score in Slave and PostgreSQL is promoted
> because there is no pgsql-data-status and no other node.
>
Ok, thanks. I'm not too familiar with pacemaker, so some operation
details are still hidden from me.
But for master migration there is much easier solution: to migrate
collocated master IP.
More information about the Pacemaker
mailing list