[Pacemaker] pgsql RA

Евгений Селявка evg.selyavka at gmail.com
Mon May 19 16:25:04 UTC 2014


Dear users,

I use pacemaker RA script pgsql wich support the replication in schema with 2
non shared servers(servers are identical) this is my environment.

Linux billing-db1 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 19
18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[root at billing-db1 ~]# cat /etc/issue
CentOS release 6.4 (Final)
Kernel \r on an \m

[root at billing-db1 ~]# rpm -qa | grep  pacemaker
pacemaker-libs-1.1.10-14.el6_5.1.x86_64
pacemaker-cli-1.1.10-14.el6_5.1.x86_64
pacemaker-1.1.10-14.el6_5.1.x86_64
pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64


[root at billing-db1 ~]# rpm -qa | grep corosync
corosynclib-1.4.1-15.el6_4.1.x86_64
corosync-1.4.1-15.el6_4.1.x86_64

[root at billing-db1 ~]# rpm -qa | grep clus
clusterlib-3.0.12.1-49.el6_4.2.x86_64

PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

pgsqlRA from here https://github.com/t-matsuo/resource-agents

This is cluster log output on command pcs resource cleanup msPostgresql:

May 19 18:59:51 [4136] billing-db1       crmd:     info:
delete_resource:       Removing resource pgsql for
95f5d2f4-ca62-4966-8cc2-89c2b9898a50 (internal) on billing-db2
May 19 18:59:51 [4133] billing-db1       lrmd:     info:
cancel_recurring_action:       Cancelling operation
pgsql_monitor_10000
May 19 18:59:51 [4134] billing-db1      attrd:   notice:
attrd_cs_dispatch:     Update relayed from billing-db2
May 19 18:59:51 [4136] billing-db1       crmd:     info:
lrm_remove_deleted_op:         Removing op pgsql_monitor_10000:317 for
deleted resource pgsql
May 19 18:59:51 [4136] billing-db1       crmd:     info:
notify_deleted:        Notifying 95f5d2f4-ca62-4966-8cc2-89c2b9898a50
on billing-db2 that pgsql was deleted
May 19 18:59:51 [4134] billing-db1      attrd:   notice:
attrd_cs_dispatch:     Update relayed from billing-db2
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Forwarding cib_delete operation for section
//node_state[@uname='billing-db1']//lrm_resource[@id='pgsql'] to
master (origin=local/crmd/183)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='last-lrm-refresh']:
OK (rc=0, origin=local/crmd/184, version=0.72.1)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Forwarding cib_modify operation for section
crm_config to master (origin=local/crmd/185)
May 19 18:59:51 [4133] billing-db1       lrmd:     info:
process_lrmd_get_rsc_info:     Resource 'pgsql' not found (3 active
resources)
May 19 18:59:51 [4136] billing-db1       crmd:     info:
process_lrm_event:     LRM operation pgsql_monitor_10000 (call=317,
status=1, cib-update=0, confirmed=true) Cancelled
May 19 18:59:51 [4136] billing-db1       crmd:     info:
update_history_cache:  Resource pgsql no longer exists, not updating
cache
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
//node_state[@uname='billing-db2']//lrm_resource[@id='pgsql']: OK
(rc=0, origin=billing-db2/crmd/2058, version=0.72.2)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
//node_state[@uname='billing-db1']//lrm_resource[@id='pgsql']: OK
(rc=0, origin=billing-db2/crmd/183, version=0.72.3)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
crm_config: OK (rc=0, origin=billing-db2/crmd/185, version=0.73.1)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
crm_config: OK (rc=0, origin=local/crmd/186, version=0.73.1)
May 19 18:59:51 [4136] billing-db1       crmd:     info:
plugin_handle_membership:      Membership 1192: quorum retained
May 19 18:59:51 [4131] billing-db1        cib:     info:
write_cib_contents:    Archived previous version as
/var/lib/pacemaker/cib/cib-27.raw
May 19 18:59:51 [4131] billing-db1        cib:     info:
write_cib_contents:    Wrote version 0.73.0 of the CIB to disk
(digest: 76e018df11f3d006d9ba3e0d6fc28225)
May 19 18:59:51 [4133] billing-db1       lrmd:     info:
process_lrmd_get_rsc_info:     Resource 'pgsql' not found (3 active
resources)
May 19 18:59:51 [4133] billing-db1       lrmd:     info:
process_lrmd_get_rsc_info:     Resource 'pgsql:0' not found (3 active
resources)
May 19 18:59:51 [4133] billing-db1       lrmd:     info:
process_lrmd_rsc_register:     Added 'pgsql' to the rsc list (4 active
resources)
May 19 18:59:51 [4136] billing-db1       crmd:     info:
do_lrm_rsc_op:         Performing
key=8:444:7:c52110df-29fb-41e4-9f4b-fa87787771ba op=pgsql_monitor_0
May 19 18:59:51 [4131] billing-db1        cib:     info: retrieveCib:
 Reading cluster configuration from: /var/lib/pacemaker/cib/cib.MjStjO
(digest: /var/lib/pacemaker/cib/cib.z5hNP8)
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_new:        Connecting 0x1b88300 for uid=0 gid=0 pid=32084
id=2c3b9838-ac73-4bc0-b8b0-e7c896a55a3f
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
'all': OK (rc=0, origin=local/crm_mon/2, version=0.73.1)
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_destroy:    Destroying 0 events
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_new:        Connecting 0x1b88300 for uid=0 gid=0 pid=32095
id=78412cee-994e-4efc-a268-f361c5039681
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.73.1)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
//cib/configuration/nodes//node[@id='billing-db1']//instance_attributes//nvpair[@name='pgsql-data-status']:
OK (rc=0, origin=local/crm_attribute/3, version=0.73.1)
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_destroy:    Destroying 0 events
May 19 18:59:51 [4133] billing-db1       lrmd:   notice:
operation_finished:    pgsql_monitor_0:32032:stderr [
2014/05/19_18:59:51 INFO: Don't check /var/lib/pgsql/9.1/data/ during
probe ]
May 19 18:59:51 [4136] billing-db1       crmd:   notice:
process_lrm_event:     LRM operation pgsql_monitor_0 (call=326, rc=0,
cib-update=187, confirmed=true) ok
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Forwarding cib_modify operation for section
status to master (origin=local/crmd/187)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
status: OK (rc=0, origin=billing-db2/crmd/187, version=0.73.2)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
status: OK (rc=0, origin=billing-db2/crmd/2067, version=0.73.3)
May 19 18:59:51 [4136] billing-db1       crmd:     info:
do_lrm_rsc_op:         Performing
key=15:445:0:c52110df-29fb-41e4-9f4b-fa87787771ba
op=pgsql_monitor_10000
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_new:        Connecting 0x1beca10 for uid=0 gid=0 pid=32155
id=057f384e-4142-4cfd-8df3-91d1262ecf01
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
'all': OK (rc=0, origin=local/crm_mon/2, version=0.73.3)
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_destroy:    Destroying 0 events
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_new:        Connecting 0x1beca10 for uid=0 gid=0 pid=32166
id=5f86fae5-b028-4c63-94a2-461d8adfd1b0
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.73.3)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_query operation for section
//cib/configuration/nodes//node[@id='billing-db1']//instance_attributes//nvpair[@name='pgsql-data-status']:
OK (rc=0, origin=local/crm_attribute/3, version=0.73.3)
May 19 18:59:51 [4131] billing-db1        cib:     info:
crm_client_destroy:    Destroying 0 events
May 19 18:59:51 [4136] billing-db1       crmd:   notice:
process_lrm_event:     LRM operation pgsql_monitor_10000 (call=329,
rc=0, cib-update=188, confirmed=false) ok
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Forwarding cib_modify operation for section
status to master (origin=local/crmd/188)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
status: OK (rc=0, origin=billing-db2/crmd/188, version=0.73.4)
May 19 18:59:51 [4131] billing-db1        cib:     info:
cib_process_request:   Completed cib_apply_diff operation for section
status: OK (rc=0, origin=billing-db2/crmd/2069, version=0.73.5)

The proublem is that one of the server is in HS:alone data status
permanently, after failover occur. I try to set this attribute by hand
with command crm_attribute -l forever -N billing-db1 -n
"pgsql-data-status" -v "STREAMING|SYNC" and crm_attribute -l forever
-N billing-db1 -n "pgsql-status" -v "HS:sync". I also manually add
string 'billing-db1' in this file(/var/lib/pgsql/tmp/rep_mode.conf),
on master billing-db2 server, and really replication exists:

user=postgres,db=postgres@[local]:5433=# SELECT
application_name,state,sync_state from pg_stat_replication ;
┌──────────────────┬───────────┬────────────┐
│ application_name │   state   │ sync_state │
├──────────────────┼───────────┼────────────┤
│ billing-db1      │ streaming │ sync       │
└──────────────────┴───────────┴────────────┘

[root at billing-db2 ~]# cat ~postgres/tmp/rep_mode.conf
synchronous_standby_names = 'billing-db1'

And now two servers in synchronous replication, but script can't detect this.

[root at billing-db2 ~]# crm_mon -Af1
Last updated: Mon May 19 19:20:00 2014
Last change: Mon May 19 19:11:56 2014 via crmd on billing-db1
Stack: classic openais (with plugin)
Current DC: billing-db2 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
6 Resources configured


Online: [ billing-db1 billing-db2 ]

 vip-master     (ocf::heartbeat:IPaddr2):       Started billing-db2
 vip-slave      (ocf::heartbeat:IPaddr2):       Started billing-db2
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ billing-db2 ]
     Slaves: [ billing-db1 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ billing-db1 billing-db2 ]

Node Attributes:
* Node billing-db1:
    + default_ping_set                  : 100
    + master-pgsql                      : 1000
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:alone
    + pgsql-xlog-loc                    : 0000002616E52B88
* Node billing-db2:
    + default_ping_set                  : 100
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 0000002606B0B2C8
    + pgsql-status                      : PRI

Migration summary:
* Node billing-db1:
* Node billing-db2:
As you can see i expect that attribute pgsql-data-status will be
'HS:sync' but pgsql-status is not equal HS:sync. On both servers i
don't see anything except rep_mode.conf in ~postgres/tmp/

I see only one possible chance it is erasing configuration on both
node and applying new one, but it will cause downtime.

Where the problem may be, could you help with advice?

-- 
Best Regards,
Seliavka Evgenii




More information about the Pacemaker mailing list