[Pacemaker] [Problem] The crmd reboots by the parameter mistake of the cibadmin command.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Wed Nov 5 23:13:12 CET 2014


Hi All,

Our user operated cibadmin command by mistake.
By an operation error, reboot of crmd occurs.

Step 1) Start a cluster.

[root at rh70-node1 ~]# crm_mon -1 -Af
Last updated: Wed Nov  5 10:26:51 2014
Last change: Wed Nov  5 10:23:39 2014
Stack: corosync
Current DC: rh70-node1 (3232238160) - partition WITHOUT quorum
Version: 1.1.12-85c093e
1 Nodes configured
0 Resources configured


Online: [ rh70-node1 ]


Node Attributes:
* Node rh70-node1:

Migration summary:
* Node rh70-node1: 

Step 2) A user adds a node by wrong designation.

cibadmin -C -o nodes -X '<node id="hpg604" type="normal" uname="hpg604"/>'




The crmd core-dump and reboots.

----------------------------
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_process_request: Forwarding cib_create operation for section nodes to master (origin=local/cibadmin/2)
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: Diff: --- 0.2.7 2
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: Diff: +++ 0.3.0 92153f86c58ed569196d946612f0dab8
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: +  /cib:  @epoch=3, @num_updates=0
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: ++ /cib/configuration/nodes:  <node id="hpg604" type="normal" uname="hpg604"/>
Nov  5 10:28:17 rh70-node1 cib[2167]: info: cib_process_request: Completed cib_create operation for section nodes: OK (rc=0, origin=rh70-node1/cibadmin/2, version=0.3.0)
Nov  5 10:28:17 rh70-node1 crmd[2172]: error: crm_int_helper: Characters left over after parsing 'hpg604': 'hpg604'
Nov  5 10:28:17 rh70-node1 crmd[2172]: error: crm_abort: crm_find_peer: Triggered fatal assert at membership.c:338 : id > 0 || uname != NULL
Nov  5 10:28:17 rh70-node1 cib[2223]: info: write_cib_contents: Archived previous version as /var/lib/pacemaker/cib/cib-2.raw
Nov  5 10:28:17 rh70-node1 cib[2223]: info: write_cib_contents: Wrote version 0.3.0 of the CIB to disk (digest: fd92fe00a0f0478246b1c9f1d2be83a8)
Nov  5 10:28:17 rh70-node1 cib[2223]: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.CARj72 (digest: /var/lib/pacemaker/cib/cib.XK4ybJ)
Nov  5 10:28:17 rh70-node1 abrt-hook-ccpp: Saved core dump of pid 2172 (/usr/libexec/pacemaker/crmd) to /var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172 (18141184 bytes)
Nov  5 10:28:18 rh70-node1 abrt-server: Executable '/usr/libexec/pacemaker/crmd' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Nov  5 10:28:18 rh70-node1 abrt-server: 'post-create' on '/var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172' exited with 1
Nov  5 10:28:18 rh70-node1 abrt-server: Deleting problem directory '/var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172'
Nov  5 10:28:18 rh70-node1 pacemakerd[2166]: error: child_waitpid: Managed process 2172 (crmd) dumped core
Nov  5 10:28:18 rh70-node1 pacemakerd[2166]: error: pcmk_child_exit: The crmd process (2172) terminated with signal 6 (core=1)
Nov  5 10:28:18 rh70-node1 pacemakerd[2166]: notice: pcmk_process_exit: Respawning failed child process: crmd
Nov  5 10:28:18 rh70-node1 pacemakerd[2166]: info: start_child: Using uid=992 and group=990 for process crmd
Nov  5 10:28:18 rh70-node1 pacemakerd[2166]: info: start_child: Forked child 2228 for process crmd
Nov  5 10:28:18 rh70-node1 crmd[2228]: info: crm_log_init: Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
Nov  5 10:28:18 rh70-node1 crmd[2228]: notice: main: CRM Git Version: 85c093e
Nov  5 10:28:18 rh70-node1 crmd[2228]: info: do_log: FSA: Input I_STARTUP from crmd_init() received in state S_STARTING
Nov  5 10:28:18 rh70-node1 crmd[2228]: info: get_cluster_type: Verifying cluster type: 'corosync'
----------------------------

It is an operation error of the user, but it is not desirable for crmd to reboot.

We request the improvement that crmd does not reboot.

Best Regards,
Hideo Yamauchi.




More information about the Pacemaker mailing list