<div dir="ltr">Hey folks, <div><br></div><div>Following few battles with the thing - I managed to get pgsql RA to run on 4 nodes, it&#39;s all great, however...</div><div>When testing the failover, I unplugged the &#39;master&#39; machine, the slaves are getting sorted out, new master is elected, however the slaves now don&#39;t reconnect to the new master. </div>
<div>They all complain about missing stuff in pg_archive, which I was told to ignore. </div><div style>But they still don&#39;t reconnect to the new master to keep the replication going. </div><div style><br></div><div style>
<br></div><div><div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000007000000010000003F&#39;: No such file or directory</div><div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000007000000010000003F&#39;: No such file or directory</div>
<div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000008.history&#39;: No such file or directory</div><div>FATAL:  timeline 8 of the primary does not match recovery target timeline 7</div><div><br></div></div><div>
<br></div><div style>It&#39;s the last line that worries me. </div><div style>Until I run rsync manually to sync up pg_archive with master, it doesn&#39;t work anymore. </div><div style><br></div><div style>Not sure where did I go wrong. </div>
<div style><br></div><div style>Here&#39;s my crm config:</div><div style><br></div><div style><div>node hanode01 \</div><div>        attributes pgsql-data-status=&quot;DISCONNECT&quot; kernel=&quot;2.6.32-279.el6.x86_64&quot; foobar=&quot;barfoo&quot;</div>
<div>node hanode02 \</div><div>        attributes pgsql-data-status=&quot;DISCONNECT&quot;</div><div>node hanode03 \</div><div>        attributes pgsql-data-status=&quot;LATEST&quot;</div><div>node hanode04 \</div><div>        attributes pgsql-data-status=&quot;DISCONNECT&quot;</div>
<div>primitive pgsql ocf:heartbeat:pgsql \</div><div>        params pgctl=&quot;/usr/pgsql-9.2/bin/pg_ctl&quot; psql=&quot;/usr/pgsql-9.2/bin/psql&quot; pgdata=&quot;/var/lib/pgsql/9.2/data/&quot; restore_command=&quot;cp /var/lib/pgsql/9.2/data/pg_archive/\%f \%p&quot; start_opt=&quot;-p 5432&quot; rep_mode=&quot;async&quot; node_list=&quot;hanode01 hanode02 hanode03 hanode04&quot; master_ip=&quot;10.0.1.100&quot; stop_escalate=&quot;0&quot; repuser=&quot;replicator&quot; monitor_password=&quot;lemon31ee7&quot; monitor_user=&quot;monitor&quot; \</div>
<div>        op start interval=&quot;0s&quot; role=&quot;Master&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div><div>        op monitor interval=&quot;2s&quot; role=&quot;Master&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div>
<div>        op monitor interval=&quot;7s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div><div>        op promote interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div><div>
        op demote interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;stop&quot; \</div><div>        op stop interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;block&quot; \</div><div>        op notify interval=&quot;0s&quot; timeout=&quot;260s&quot;</div>
<div>primitive vip-master ocf:heartbeat:IPaddr2 \</div><div>        params ip=&quot;10.0.0.100&quot; nic=&quot;eth1&quot; cidr_netmask=&quot;24&quot; \</div><div>        op start interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div>
<div>        op monitor interval=&quot;10s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div><div>        op stop interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;block&quot;</div><div>primitive vip-rep ocf:heartbeat:IPaddr2 \</div>
<div>        params ip=&quot;10.0.1.100&quot; nic=&quot;eth2&quot; cidr_netmask=&quot;24&quot; \</div><div>        op start interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div><div>        op monitor interval=&quot;10s&quot; timeout=&quot;260s&quot; on-fail=&quot;restart&quot; \</div>
<div>        op stop interval=&quot;0s&quot; timeout=&quot;260s&quot; on-fail=&quot;block&quot;</div><div>group master-group vip-master vip-rep</div><div>ms msPostgresql pgsql \</div><div>        meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;10&quot; clone-node-max=&quot;1&quot; notify=&quot;true&quot; target-role=&quot;Master&quot;</div>
<div>colocation rsc_colocation-2 inf: master-group msPostgresql:Master</div><div>order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false</div><div>order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false</div>
<div>property $id=&quot;cib-bootstrap-options&quot; \</div><div>        dc-version=&quot;1.1.9-1512.el6-2a917dd&quot; \</div><div>        cluster-infrastructure=&quot;classic openais (with plugin)&quot; \</div><div>        expected-quorum-votes=&quot;4&quot; \</div>
<div>        stonith-enabled=&quot;false&quot; \</div><div>        no-quorum-policy=&quot;ignore&quot; \</div><div>        last-lrm-refresh=&quot;1376582085&quot;</div><div>rsc_defaults $id=&quot;rsc_defaults-options&quot; \</div>
<div>        resource-stickiness=&quot;INFINITY&quot; \</div><div>        migration-threshold=&quot;5&quot;</div><div><br></div><div><br></div><div><br></div><div style>and postgresql configuration:</div><div style><div>listen_addresses = &#39;*&#39;</div>
<div>wal_level = hot_standby</div><div>synchronous_commit = on</div><div>archive_mode = on</div><div>archive_command = &#39;cp %p /var/lib/pgsql/9.2/data/pg_archive/%f&#39;</div><div>max_wal_senders=5</div><div>wal_keep_segments = 32</div>
<div>hot_standby = on</div><div>restart_after_crash = off</div><div>replication_timeout = 5000         # mseconds</div><div>wal_receiver_status_interval = 2   # seconds</div><div>max_standby_streaming_delay = -1</div><div>
max_standby_archive_delay = -1</div><div>synchronous_commit = on</div><div>restart_after_crash = off</div><div>hot_standby_feedback = on</div><div><br></div><div><br></div><div style>, pg_hba:</div><div style><br></div><div style>
<div><br></div><div># &quot;local&quot; is for Unix domain socket connections only</div><div>local   all             all                                     trust</div><div># IPv4 local connections:</div><div>host    all             all             <a href="http://127.0.0.1/32">127.0.0.1/32</a>            trust</div>
<div># Allow replication connections from localhost, by a user with the</div><div># replication privilege.</div><div>#local   replication     postgres                                peer</div><div>host    replication     postgres        <a href="http://127.0.0.1/32">127.0.0.1/32</a>            trust</div>
<div>host    replication     replicator        <a href="http://10.0.0.0/8">10.0.0.0/8</a>            trust</div><div>host    all             all             <a href="http://10.0.0.0/8">10.0.0.0/8</a>               md5</div>
<div><br></div></div></div><div style><br></div></div><div><br clear="all"><div><br></div>-- <br>GJ
</div></div>