[Pacemaker] Postgresql Replication

Thu Sep 19 01:55:10 UTC 2013

On 12/09/2013, at 11:58 PM, Takatoshi MATSUO <matsuo.tak at gmail.com> wrote:

> Hi
> 
> 2013/9/12 Eloy Coto Pereiro <eloy.coto at gmail.com>:
>> Hi,
>> 
>> Thanks for your help, I use the same example. In this case Kamailio need to
>> start after postgresql. But this is not a problem I think, the replication
>> work ok without corosync. I stop all process and start to work with
>> corosync.
>> 
>> When I start corosync I see this log in my slave:
>> 
>> Sep 12 16:12:50 slave pgsql(pgsql)[26092]: INFO: Master does not exist.
>> Sep 12 16:12:50 slave pgsql(pgsql)[26092]: WARNING: My data is out-of-date.
>> status=DISCONNECT
> 
> Did you start PostgreSQL on master(node-name) and it became Master ?
> These logs mean that slave doesn't see Master and slave's data is old.
> (It's confusable to use hostname "master" and "slave")
> 
> Please stop pacemaker and erase all configuration and re-load
> original configuration

Seems a bit extreme.  There are easier ways to flush out the operation history than that.

> which doesn't have
> ----
> node master \
> attributes maintenance="off" pgsql-data-status="LATEST"
> node slave \
> attributes pgsql-data-status="DISCONNECT"
> ---
> Because Pacemaker records last data status.
> 
>> But all data is the same, and If I run the slave server with normal postgres
>> the replication is ok. Any idea?
>> 
>> Cheers
>> 
>> 
>> 2013/9/12 Takatoshi MATSUO <matsuo.tak at gmail.com>
>>> 
>>> Hi Eloy
>>> 
>>> 
>>> 2013/9/12 Eloy Coto Pereiro <eloy.coto at gmail.com>:
>>>> Hi,
>>>> 
>>>> I have issues with this config, for example when master is running
>>>> corosync
>>>> service use pg_ctl. But in the slave pg_ctl doesn't start and
>>>> replication
>>>> doesn't work.
>>>> 
>>>> This is my data:
>>>> 
>>>> 
>>>> Online: [ master slave ]
>>>> OFFLINE: [ ]
>>>> 
>>>> Full list of resources:
>>>> 
>>>> ClusterIP (ocf::heartbeat:IPaddr2): Started master
>>>> KAMAILIO        (lsb:kamailio): Started master
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>>     Masters: [ master ]
>>>>     Stopped: [ pgsql:1 ]
>>>> 
>>>> Node Attributes:
>>>> * Node master:
>>>>    + maintenance                       : off
>>>>    + master-pgsql                      : 1000
>>>>    + pgsql-data-status                 : LATEST
>>>>    + pgsql-master-baseline             : 0000000019000080
>>>>    + pgsql-status                      : PRI
>>>> * Node slave:
>>>>    + pgsql-data-status                 : DISCONNECT
>>>>    + pgsql-status                      : HS:sync
>>>> 
>>>> 
>>>> In my crm configure show is this:
>>>> node master \
>>>> attributes maintenance="off" pgsql-data-status="LATEST"
>>>> node slave \
>>>> attributes pgsql-data-status="DISCONNECT"
>>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>> params ip="10.1.1.1" cidr_netmask="24" \
>>>> op monitor interval="15s" \
>>>> op start timeout="60s" interval="0s" on-fail="stop" \
>>>> op monitor timeout="60s" interval="10s" on-fail="restart" \
>>>> op stop timeout="60s" interval="0s" on-fail="block"
>>>> primitive KAMAILIO lsb:kamailio \
>>>> op monitor interval="10s" \
>>>> op start interval="0" timeout="120s" \
>>>> op stop interval="0" timeout="120s" \
>>>> meta target-role="Started"
>>>> primitive pgsql ocf:heartbeat:pgsql \
>>>> params pgctl="/usr/pgsql-9.2/bin/pg_ctl" psql="/usr/pgsql-9.2/bin/psql"
>>>> pgdata="/var/lib/pgsql/9.2/data/" rep_mode="sync" node_list="master
>>>> slave"
>>>> restore_command="cp /var/lib/pgsql/9.2/pg_archive/%f %p"
>>>> primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>>>> keepalives_count=5" master_ip="10.1.1.1" restart_on_promote="true" \
>>>> op start timeout="60s" interval="0s" on-fail="restart" \
>>>> op monitor timeout="60s" interval="4s" on-fail="restart" \
>>>> op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \
>>>> op promote timeout="60s" interval="0s" on-fail="restart" \
>>>> op demote timeout="60s" interval="0s" on-fail="stop" \
>>>> op stop timeout="60s" interval="0s" on-fail="block" \
>>>> op notify timeout="60s" interval="0s"
>>>> ms msPostgresql pgsql \
>>>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>>>> notify="true" target-role="Started"
>>>> location cli-prefer-KAMAILIO KAMAILIO \
>>>> rule $id="cli-prefer-rule-KAMAILIO" inf: #uname eq master
>>>> location cli-prefer-pgsql msPostgresql \
>>>> rule $id="cli-prefer-rule-pgsql" inf: #uname eq master
>>>> location cli-standby-ClusterIP ClusterIP \
>>>> rule $id="cli-standby-rule-ClusterIP" -inf: #uname eq slave
>>> 
>>> This location is invalid.
>>> It means that ClusterIP can't run on slave.
>>> 
>>>> colocation colocation-1 inf: ClusterIP msPostgresql KAMAILIO
>>> 
>>> PostgreSQL needs KAMAILIO to start ?
>>> It means that Pacemaker can't start PostgreSQL on slave.
>>> 
>>> Sample setting is
>>>  colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>> 
>>> At the very beginning, you might want to customize sample settings.
>>> 
>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#sample_settings_for_crm_command
>>> 
>>> And please see logs because pgsql RA outputs some useful logs.
>>> 
>>>> order order-1 inf: ClusterIP msPostgresql KAMAILIO
>>>> property $id="cib-bootstrap-options" \
>>>> dc-version="1.1.8-7.el6-394e906" \
>>>> cluster-infrastructure="classic openais (with plugin)" \
>>>> expected-quorum-votes="2" \
>>>> stonith-enabled="false"
>>>> 
>>>> Any idea why doesn't start on the second slave?
>>>> 
>>>> More info:
>>>> 
>>>> Master:
>>>> 
>>>> root at master ~]# netstat -putan | grep 5432 | grep LISTEN
>>>> tcp        0      0 0.0.0.0:5432                0.0.0.0:*
>>>> LISTEN      3241/postgres
>>>> tcp        0      0 :::5432                     :::*
>>>> LISTEN      3241/postgres
>>>> [root at master ~]# ps axu | grep postgres
>>>> postgres  3241  0.0  0.0  97072  7692 ?        S    11:41   0:00
>>>> /usr/pgsql-9.2/bin/postgres -D /var/lib/pgsql/9.2/data -c
>>>> config_file=/var/lib/pgsql/9.2/data//postgresql.conf
>>>> postgres  3293  0.0  0.0  97072  1556 ?        Ss   11:41   0:00
>>>> postgres:
>>>> checkpointer process
>>>> postgres  3294  0.0  0.0  97072  1600 ?        Ss   11:41   0:00
>>>> postgres:
>>>> writer process
>>>> postgres  3295  0.0  0.0  97072  1516 ?        Ss   11:41   0:00
>>>> postgres:
>>>> wal writer process
>>>> postgres  3296  0.0  0.0  97920  2760 ?        Ss   11:41   0:00
>>>> postgres:
>>>> autovacuum launcher process
>>>> postgres  3297  0.0  0.0  82712  1500 ?        Ss   11:41   0:00
>>>> postgres:
>>>> archiver process   failed on 000000010000000000000001
>>>> postgres  3298  0.0  0.0  82872  1568 ?        Ss   11:41   0:00
>>>> postgres:
>>>> stats collector process
>>>> root     10901  0.0  0.0 103232   852 pts/0    S+   11:44   0:00 grep
>>>> postgres
>>>> 
>>>> 
>>>> On slave:
>>>> 
>>>> [root at slave ~]# ps axu | grep postgre
>>>> root      3332  0.0  0.0 103232   856 pts/0    S+   11:45   0:00 grep
>>>> postgre
>>>> [root at slave ~]# netstat -putan | grep 5432
>>>> [root at slave ~]#
>>>> 
>>>> 
>>>> If I make pg_ctl /var/lib/pgsql/9.2/data/ start work ok
>>>> 
>>>> Any idea?
>>>> 
>>>> 
>>>> 2013/9/11 Takatoshi MATSUO <matsuo.tak at gmail.com>
>>>>> 
>>>>> Hi Eloy
>>>>> 
>>>>> Please see http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster .
>>>>> In the document, it uses virtual IP to receive connection,
>>>>> so it doesn't need to change recovery.conf.
>>>>> 
>>>>> Thanks,
>>>>> Takatoshi MATSUO
>>>>> 
>>>>> 
>>>>> 2013/9/11 Eloy Coto Pereiro <eloy.coto at gmail.com>:
>>>>>> Hi,
>>>>>> 
>>>>>> In Postgresql if you use wal replication
>>>>>> <http://wiki.postgresql.org/wiki/Streaming_Replication> when the
>>>>>> master
>>>>>> servers fails need to change the recovery.conf on the slave server.
>>>>>> 
>>>>>> In this case any tool, when the master is down, execute a command and
>>>>>> get
>>>>>> this info?
>>>>>> Is this the right tool for postgresql's replication?
>>>>>> 
>>>>>> Cheers
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130919/d54e537e/attachment-0002.sig>