[Pacemaker] Postresql streaming replication support

Mon Jan 14 21:27:33 EST 2013

Thank you Takatoshi, 

  I appreciate your help on this. I just got pulled onto another project, but will be coming back to this in a few days. I thought I had tried this, but had issues consistently getting test-db1 to come up initially. I need to debug that further. I had the configuration the same as you had presented to me, so it may be the order in which I was bringing them up as you suggest here. 

  When I can get back on to this, I will let you know. I think this is better way than what we are currently looking at doing with having scripts do the promoting/demoting and syncing, following a dummy resource. 

Thanks again, I REALLY appreciate the help. 

Keith

________________________________________
From: Takatoshi MATSUO [matsuo.tak at gmail.com]
Sent: Monday, January 14, 2013 7:55 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postresql streaming replication support

Hi Keith

> When I set the master_ip as the Cluster IP, Postgres does not come up because it is looking for the virtual IP that is dependant of Postgres master. Kind of a catch 22?

Slave's PostgreSQL ?
Slave connects to Virtual IP which is started on Master,
so it's normal.
Please check pg_hba.conf if Slave can't connect to Master through ClusterIP.

Pacemaker and PostgreSQL work as follows.

== You start Pacemaker on test-db1 ==
1. Pacemkaer starts　PostgreSQL as Slave on test-db1
2. Pacemaker promotes PostgreSQL on test-db1
   Note: If pgsql-data-status is not "LATEST" or "STREAMING|SYNC", RA
can't promote it.
3. Pacemaker starts ClusterIP on test-db1
== You sync data  (from test-db1 to test-db2) ==
== You start Pacemaker on test-db2 ==
4. Pacemaker start PostgreSQL as Slave on test-db2
5. PostgreSQL connects to test-db1's ClusterIP

crm_mon shows as follows.

# crm_mon -Af
-----------
Online: [ test-db1 test-db2 ]

ClusterIP       (ocf::heartbeat:IPaddr2):       Started test-db1
 Master/Slave Set: msPostgresql
     Masters: [ test-db1 ]
     Slaves: [ test-db2 ]

Node Attributes:
* Node test-db1:
    + master-pgsql:0                    : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 0000000024028270
    + pgsql-status                      : PRI
* Node test-db2:
    + master-pgsql:1                    : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync

Migration summary:
* Node test-db1:
* Node test-db2:
-----------

Regards,
Takatoshi MATSUO

2013/1/12 Keith Ouellette <Keith.Ouellette at airgas.com>:
> When I set the master_ip as the Cluster IP, Postgres does not come up because it is looking for the virtual IP that is dependant of Postgres master. Kind of a catch 22?
>
> When I configure the master_ip as the IP of one of the servers, Postgres only comes up if that one is available. I will try the repuser.
>
> Thanks,
> Keith
>
>
> ________________________________________
> From: Takatoshi MATSUO [matsuo.tak at gmail.com]
> Sent: Friday, January 11, 2013 6:41 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Postresql streaming replication support
>
> Hi Keith
>
> Can Slave access ClusterIP ?
> And can Master receive it ?
>
> You can change "user=postgres" using repuser parameter.
> Please use .pgpass file to change password.
>
> Thanks,
> Takatoshi MATSUO
>
> 2013/1/12 Keith Ouellette <Keith.Ouellette at airgas.com>:
>> Thank you again. It looks like it is trying now, but when I use the virtual IP I get the following in the postgres log (../pg_log/):
>>
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/000000010000000100000021': No such file or directory
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/000000010000000100000021': No such file or directory
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/00000002.history': No such file or directory
>> 2013-01-11 10:22:06 EST FATAL:  could not connect to the primary server: could not connect to server: No route to host
>>                 Is the server running on host "172.16.0.110" and accepting
>>                 TCP/IP connections on port 5432?
>>
>> If I put the IP of one fo the servers (former master before pacemaker), I get the following:
>>
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/000000010000000100000021': No such file or directory
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/000000010000000100000021': No such file or directory
>> cp: cannot stat '/opt/PostgreSQL/9.2/data/archive/00000002.history': No such file or directory
>> 2013-01-11 10:22:06 EST FATAL:  could not connect to the primary server: could not connect to server: No route to host
>>                 Is the server running on host "172.16.0.110" and accepting
>>                 TCP/IP connections on port 5432?
>>
>> So I gues there are two issues, one is it appears that postgres will not come up unless the IP is in place?? Second is where do you define the user=postgres and password=<password>. Shold that be in the primary_conninfo_opt along with the keepalives?
>>
>> Thanks again for your help. it is greatly appreciated.
>>
>> Keith
>> ________________________________________
>> From: Takatoshi MATSUO [matsuo.tak at gmail.com]
>> Sent: Thursday, January 10, 2013 7:20 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Postresql streaming replication support
>>
>> Hi Keith
>>
>>> So the master-ip is the virtual IP?
>>
>> Yes.
>> If Master is switched, static IP is changed.
>> So it needs virtual IP or multiple instance_attributes.
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html
>>
>> Regards,
>> Takatoshi MATSUO
>>
>> 2013/1/10 Keith Ouellette <Keith.Ouellette at airgas.com>:
>>> Thank you again, So the master-ip is the virtual IP?  I was thinking that was the IP address of the master database server. Thank you for this. I am going to try this and see if I can get it to function.
>>>
>>> Thanks,
>>> Keith
>>>
>>>
>>> ________________________________________
>>> From: Takatoshi MATSUO [matsuo.tak at gmail.com]
>>> Sent: Wednesday, January 09, 2013 11:50 PM
>>> To: The Pacemaker cluster resource manager
>>> Subject: Re: [Pacemaker] Postresql streaming replication support
>>>
>>> Hi Keith
>>>
>>> Do you want to start "ClusterIP" on Slave ?
>>> It seems that ClusterIP is based on vip-slave of my sample configuration.
>>>
>>> If you would like to start it on Master, you need to customize "vip-master".
>>> And pgsql's master-ip parameter needs the IP of vip-master which
>>> combines with vip-rep.
>>>
>>> like this
>>> ----------------------------------------------------
>>> property \
>>>     no-quorum-policy="ignore" \
>>>     stonith-enabled="false" \
>>>     crmd-transition-delay="0s"
>>>
>>> rsc_defaults \
>>>     resource-stickiness="INFINITY" \
>>>     migration-threshold="1"
>>>
>>> ms msPostgresql pgsql \
>>>     meta \
>>>         master-max="1" \
>>>         master-node-max="1" \
>>>         clone-max="2" \
>>>         clone-node-max="1" \
>>>         notify="true"
>>>
>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>     params \
>>>         ip="172.16.0.110" \
>>>         cidr_netmask="24" \
>>>     op start   timeout="60s" interval="0s"  on-fail="stop" \
>>>     op monitor timeout="60s" interval="10s" on-fail="restart" \
>>>     op stop    timeout="60s" interval="0s"  on-fail="block"
>>>
>>> primitive pgsql ocf:heartbeat:pgsql \
>>>     params \
>>>         pgctl="/opt/PostgreSQL/9.2/bin/pg_ctl" \
>>>         psql="/opt/PostgreSQL/9.2/bin/psql" \
>>>         pgdata="/opt/PostgreSQL/9.2/data/" \
>>>         start_opt="-p 5432" \
>>>         rep_mode="sync" \
>>>         node_list="test-db1 test-db2" \
>>>         restore_command="cp /opt/PostgreSQL/9.2/data/archive/%f %p" \
>>>         primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>>> keepalives_count=5" \
>>>         master_ip="172.16.0.110" \
>>>         stop_escalate="0" \
>>>     op start   timeout="60s" interval="0s"  on-fail="restart" \
>>>     op monitor timeout="60s" interval="11s" on-fail="restart" \
>>>     op monitor timeout="60s" interval="10s"  on-fail="restart" role="Master" \
>>>     op promote timeout="60s" interval="0s"  on-fail="restart" \
>>>     op demote  timeout="60s" interval="0s"  on-fail="block" \
>>>     op stop    timeout="60s" interval="0s"  on-fail="block" \
>>>     op notify  timeout="60s" interval="0s"
>>>
>>> colocation rsc_colocation-1 inf: ClusterIP        msPostgresql:Master
>>>
>>> order rsc_order-1 0: msPostgresql:promote  ClusterIP:start   symmetrical=false
>>> order rsc_order-2 0: msPostgresql:demote   ClusterIP:stop    symmetrical=false
>>> ----------------------------------------------------
>>>
>>> Please see this to operate it.
>>> https://github.com/t-matsuo/resource-agents/wiki/Operation-examples-for-none-shared-wal-archives-environment
>>>
>>> --
>>> Takatoshi
>>>
>>> 2013/1/10 Keith Ouellette <Keith.Ouellette at airgas.com>:
>>>> Thank you. I wish I could have seen the presentation. In my case, the two nodes are in two different US cities connected by a bridged VPN. I do not have the ability to have separate connection for pacemaker, replication and access.
>>>>
>>>>
>>>>
>>>> I tried to scale the configuration back to support one VIP, Here is the XML output of what I tried, but the service does not even come up.
>>>>
>>>> <resources>
>>>>       <primitive class="ocf" id="ClusterIP" provider="heartbeat" type="IPaddr2">
>>>>         <meta_attributes id="ClusterIP-meta_attributes">
>>>>           <nvpair id="ClusterIP-meta_attributes-target-role" name="target-role" value="Started"/>
>>>>         </meta_attributes>
>>>>         <operations id="ClusterIP-operations">
>>>>           <op id="ClusterIP-op-monitor-10s" interval="10s" name="monitor" timeout="20s"/>
>>>>         </operations>
>>>>         <instance_attributes id="ClusterIP-instance_attributes">
>>>>           <nvpair id="ClusterIP-instance_attributes-ip" name="ip" value="172.16.0.110"/>
>>>>           <nvpair id="ClusterIP-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
>>>>         </instance_attributes>
>>>>       </primitive>
>>>>       <primitive class="ocf" id="pgsql" provider="heartbeat" type="pgsql">
>>>>         <meta_attributes id="pgsql-meta_attributes">
>>>>           <nvpair id="pgsql-meta_attributes-target-role" name="target-role" value="Started"/>
>>>>         </meta_attributes>
>>>>         <operations id="pgsql-operations">
>>>>           <op id="pgsql-op-start-0" interval="0" name="start" on-fail="restart" timeout="120"/>
>>>>           <op id="pgsql-op-monitor-7" interval="7" name="monitor" on-fail="restart" timeout="60"/>
>>>>           <op id="pgsql-op-monitor-Master-60" interval="60" name="monitor" on-fail="restart" role="Master" timeout="30"/>
>>>>           <op id="pgsql-op-promote-0" interval="0" name="promote" on-fail="restart" timeout="60"/>
>>>>           <op id="pgsql-op-stop-0" interval="0" name="stop" on-fail="block" timeout="60"/>
>>>>           <op id="pgsql-op-demote-0" interval="0" name="demote" on-fail="block" timeout="60"/>
>>>>           <op id="pgsql-op-notify-0" interval="0" name="notify" timeout="60"/>
>>>>         </operations>
>>>>         <instance_attributes id="pgsql-instance_attributes">
>>>>           <nvpair id="pgsql-instance_attributes-pgctl" name="pgctl" value="/opt/PostgreSQL/9.2/bin/pg_ctl"/>
>>>>           <nvpair id="pgsql-instance_attributes-psql" name="psql" value="/opt/PostgreSQL/9.2/bin/psql"/>
>>>>           <nvpair id="pgsql-instance_attributes-pgdata" name="pgdata" value="/opt/PostgreSQL/9.2/data/"/>
>>>>           <nvpair id="pgsql-instance_attributes-start_opt" name="start_opt" value="-p 5432"/>
>>>>           <nvpair id="pgsql-instance_attributes-rep_mode" name="rep_mode" value="sync"/>
>>>>           <nvpair id="pgsql-instance_attributes-node_list" name="node_list" value="test-db1 test-db2"/>
>>>>           <nvpair id="pgsql-instance_attributes-restore_command" name="restore_command" value="cp /opt/PostgreSQL/9.2/data/archive/%f %p"/>
>>>>           <nvpair id="pgsql-instance_attributes-primary_conninfo_opt" name="primary_conninfo_opt" value="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"/>
>>>>           <nvpair id="pgsql-instance_attributes-master_ip" name="master_ip" value="172.16.0.111"/>
>>>>           <nvpair id="pgsql-instance_attributes-stop_escalate" name="stop_escalate" value="0"/>
>>>>         </instance_attributes>
>>>>       </primitive>
>>>>     </resources>
>>>> <constraints>
>>>>       <rsc_location id="rsc-location-1" rsc="ClusterIP">
>>>>         <rule id="rsc-location-1-rule" score="200">
>>>>           <expression attribute="pgsql-status" id="rsc-location-1-rule-pgsql-status" operation="eq" value="HS:sync"/>
>>>>         </rule>
>>>>         <rule id="rsc-location-1-rule-0" score="100">
>>>>           <expression attribute="pgsql-status" id="rsc-location-1-rule-0-pgsql-status" operation="eq" value="PRI"/>
>>>>         </rule>
>>>>         <rule id="rsc-location-1-rule-1" score="-INFINITY">
>>>>           <expression attribute="pgsql-status" id="rsc-location-1-rule-1-pgsql-status" operation="not_defined"/>
>>>>         </rule>
>>>>         <rule boolean-op="and" id="rsc-location-1-rule-2" score="-INFINITY">
>>>>           <expression attribute="pgsql-status" id="rsc-location-1-rule-2-pgsql-status" operation="ne" value="HS:sync"/>
>>>>           <expression attribute="pgsql-status" id="rsc-location-1-rule-2-pgsql-status-0" operation="ne" value="PRI"/>
>>>>         </rule>
>>>>       </rsc_location>
>>>>       <rsc_location id="rsc-location-2" rsc="pgsql">
>>>>         <rule id="rsc-location-2-rule" role="Master" score="200">
>>>>           <expression attribute="#uname" id="rsc-location-2-rule-.uname" operation="eq" value="test-db1"/>
>>>>         </rule>
>>>>         <rule id="rsc-location-2-rule-0" role="Master" score="100">
>>>>           <expression attribute="#uname" id="rsc-location-2-rule-0-.uname" operation="eq" value="test-db2"/>
>>>>         </rule>
>>>>         <rule id="rsc-location-2-rule-1" role="Master" score="-INFINITY">
>>>>           <expression attribute="fail-count-ClusterIP" id="rsc-location-2-rule-1-fail-count-ClusterIP" operation="defined"/>
>>>>         </rule>
>>>>         <rule boolean-op="or" id="rsc-location-2-rule-2" score="-INFINITY">
>>>>           <expression attribute="default_ping_set" id="rsc-location-2-rule-2-default_ping_set" operation="not_defined"/>
>>>>           <expression attribute="default_ping_set" id="rsc-location-2-rule-2-default_ping_set-0" operation="lt" value="100"/>
>>>>         </rule>
>>>>       </rsc_location>
>>>>       <rsc_colocation id="rsc_colo-1" rsc="pgsql" score="INFINITY" with-rsc="ClusterIP"/>
>>>>       <rsc_order first="ClusterIP" first-action="start" id="rsc_order_1" symmetrical="false" then="pgsql" then-action="promote"/>
>>>>     </constraints>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>> Keith
>>>>
>>>>
>>>> ________________________________________
>>>> From: Takatoshi MATSUO [matsuo.tak at gmail.com]
>>>> Sent: Wednesday, January 09, 2013 8:23 PM
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] Postresql streaming replication support
>>>>
>>>> Hi Keith
>>>>
>>>> I don't write detailed documentation.
>>>> But I spoke about it using this slide.
>>>>
>>>>    http://linux-ha.sourceforge.jp/wp/archives/3404
>>>>
>>>> Did you read it?
>>>>
>>>> I translated it into English.
>>>> Please click "English version is here : PG-study" to download it.
>>>> (Sorry for bad translation)
>>>>
>>>>
>>>> Regards,
>>>> Takatoshi MATSUO
>>>>
>>>>
>>>> 2013/1/10 Keith Ouellette <Keith.Ouellette at airgas.com>
>>>>>
>>>>> Jesse,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    Thank you for the quick response. Actually yes, the example given at the link below shows several connections between the two (cross-overs).  We only have one connection between the two nodes with a bridged VPN.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The example has an S-LAN, D-LAN and an IC-LAN. I only have one LAN available to me with a bridge VPN between the two routers. I tried to skinny the configuration to a single virtual IP primitive called "ClusterIP", but I think there are some other dependancies in the example that I am not seeing as it did not work, meaning the resources did not start.  Is there another place I can go that may provide more detailed documentation that may help me with my configuration?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Keith
>>>>>
>>>>> ________________________________
>>>>> From: Jesse Hathaway [jesse.hathaway at getbraintree.com]
>>>>> Sent: Wednesday, January 09, 2013 2:09 PM
>>>>> To: The Pacemaker cluster resource manager; Jesse Hathaway
>>>>> Subject: Re: [Pacemaker] Postresql streaming replication support
>>>>>
>>>>> On Wed, Jan 9, 2013 at 12:56 PM, Keith Ouellette <Keith.Ouellette at airgas.com> wrote:
>>>>>>
>>>>>>   Now I am trying to get that integrated with pacemaker. In my searches, I see that pacemaker with DRBD seems to be the prevelant implementation an most documentation was written around that. However, due to our network conditions, I am looking for information on doing this with WAL. I did find a resource agent that was wirtten for what I am doing (https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication). The issue is the example if complex with the explanation written only toward that example. Is there another place that I can get documentation  on this resource agent? Is there another way that this could be accmoplished that someone has tried?
>>>>>
>>>>> Keith we are using the above resource agent in production with streaming replication. Do you have any specific questions about its usage?
>>>>>
>>>>> --
>>>>> Jesse Hathaway, Systems Engineer
>>>>> Braintree
>>>>> 917-418-8423
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org