[Pacemaker] two design questions about active/active mode?

Thu Nov 7 19:16:57 EST 2013

On 21 Aug 2013, at 1:50 pm, Wen Wen (NCS) <wenw at ncs.com.sg> wrote:

> Hi all,
> I am doing dual nodes practice.
> I use CentOS 6.3 x86_64  pacemaker DRBD and GFS2 for my cluster.
> I already test many times I have a design question.
>  
> Here is my crm status on one node after I set this node from standby to online
> I change node1 from standby to online , node2 is always online.
> The WebIP resource only run on the old active node(node2) after the failover.
> Apache resource also only run on the old active node(node2) after the failover
> Is this the design? Because no matter how edit the configuration.
> Once I failover the apache and clusterIP resource can not run on both nodes, only run on old active node.
>  
> Master/Slave Set: WebDataClone [WebData]
>      Masters: [ node1.test.com node2.test.com ]
> Clone Set: WebFSClone [WebFS]
>      Started: [ node1.test.com node2.test.com ]
> Clone Set: WebSiteClone [WebSite]
>      Started: [node2.test.com ]
>      Stopped:[WebSite:1]
> Clone Set: WebIP [ClusterIP] (unique)
>      ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started node2.test.com
>      ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started node2.test.com
> stonith_fence_virsh_node1      (stonith:fence_virsh):  Started node2.test.com
> stonith_fence_virsh_node2      (stonith:fence_virsh):  Started node1.test.com
>  
>  
> 2. another issue I explored.
> I think becoz I set the colocation to WebIP and WebData(VIP and apache) so if the WebIP only run on one node after failover
> The apache service will also only run that node.

Right. The real problem is that none of the WebIP instances move back to node1.

If you happened to persist with this (sorry for taking so long to reply, I struggle to deal with the the volume of email I receive sometimes), send me the result of 'cibadmin -Ql' when the cluster is in this state and I will take a look. It may be that more recent versions handle things better.

> So I delete the “colocation website-with-ip inf: WebSiteClone WebIP”  so the apache can run on both nodes after failover.
> But all the examples use this collocation so I really don’t know what I should do.
>  
> colocation website-with-ip inf: WebSiteClone WebIP
> order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
> order WebSite-after-WebFS inf: WebFSClone WebSiteClone
> order apache-after-vip inf: WebIP WebSiteClone
>  
> after I remove the collocation of  website-with-ip inf: WebSiteClone WebIP
> the WebSite service can run on both nodes after failover
>  
> Online: [ node1.test.com node2.test.com ]
>  
> Master/Slave Set: WebDataClone [WebData]
>      Masters: [ node1.test.com node2.test.com ]
> Clone Set: WebFSClone [WebFS]
>      Started: [ node1.test.com node2.test.com ]
> Clone Set: WebSiteClone [WebSite]
>      Started: [ node1.test.com node2.test.com ]
> Clone Set: WebIP [ClusterIP] (unique)
>      ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started node1.test.com
>      ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started node1.test.com
> stonith_fence_virsh_node1      (stonith:fence_virsh):  Started node2.test.com
> stonith_fence_virsh_node2      (stonith:fence_virsh):  Started node1.test.com
>  
> /etc/drbd.d/global_common.conf
> global {
>         usage-count yes;
> }
>  
> common {
>         handlers {
>              fence-peer          "/usr/lib/drbd/crm-fence-peer.sh";
>              after-resync-target   "/usr/lib/drbd/crm-unfence-peer.sh";
>         }
>  
>         startup {
>             become-primary-on both;
>         }
>  
>         options {
>         }
>  
>         disk {
>             fencing resource-and-stonith;
>         }
>  
>         net {
>             allow-two-primaries;
>             after-sb-0pri discard-zero-changes;
>             after-sb-1pri discard-secondary;
>             after-sb-2pri disconnect;
>            # protocol C;
>         }
> }
> Cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="1" name="pcmk">
>   <logging debug="off"/>
>   <clusternodes>
>     <clusternode name="node1.test.com" nodeid="1">
>     <fence>
>             <method name="pcmk-redirect">
>                  <device name="pcmk" port="node1.test.com"/>
>             </method>
>      </fence>
>    </clusternode>
>     <clusternode name="node2.test.com" nodeid="2">
>      <fence>
>         <method name="pcmk-redirect">
>             <device name="pcmk" port="node2.test.com"/>
>         </method>
>      </fence>
>     </clusternode>
>   </clusternodes>
>   <fencedevices>
>     <fencedevice name="pcmk" agent="fence_pcmk"/>
>    </fencedevices>
> </cluster>
>  
> ------------------ /etc/corosync/corosync.conf"-------------
> compatibility: whitetank
> service {
>         # Load the Pacemaker Cluster Resource Manager
>         name: pacemaker
>         clustername:pcmk
>         ver: 1
> }
> aisexec {
>         user: root
>         group: root
> }
> totem {
>     version: 2
>     secauth: on
>     interface {
>         member {
>             memberaddr: 192.168.119.141
>         }
>         member {
>             memberaddr: 192.168.119.142
>         }
>         ringnumber: 0
>         bindnetaddr: 192.168.119.0
>         mcastport: 5405
>         ttl: 1
>     }
>     transport: udpu
> }
>  
> logging {
>     fileline: off
>     to_logfile: yes
>     to_syslog: yes
>     debug: on
>     logfile: /var/log/cluster/corosync.log
>     debug: off
>     timestamp: on
>     logger_subsys {
>         subsys: AMF
>         debug: off
>     }
> }
> -------------------------------------
>  
> Wen Wen 
> Enterprise Management Services | EDMS Global Delivery
> M +65. | E wenw at ncs.com.sg