[Pacemaker] Pacemaker installed to custom location

Andrew Beekhof andrew at beekhof.net
Tue Apr 30 04:40:30 UTC 2013


On 26/04/2013, at 9:12 PM, James Masson <james.masson at opencredo.com> wrote:

> 
> 
> On 26/04/13 01:29, Andrew Beekhof wrote:
>> 
>> On 26/04/2013, at 12:12 AM, James Masson <james.masson at opencredo.com> wrote:
>> 
>>> 
>>> Hi list,
>>> 
>>> I'm trying to build and run pacemaker from a custom location.
>>> 
>>> Corosync starts up fine.
>>> 
>>> Pacemakerd does not - the result is.
>> 
>> Try turning up the debug to see why the cib isn't happy:
>> 
>>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd:    error: pcmk_child_exit: 	Child process cib exited (pid=10484, rc=100)
>>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd:  warning: pcmk_child_exit: 	Pacemaker child process cib no longer
>> 
>> 
>> 
> Hi Andrew,
> 
> debug log + strace are attached. The strace has something interesting...
> 
> 
> 5195  open("/dev/shm/qb-cpg-request-5173-5195-19-header", O_RDWR) = -1 EACCES (Permission denied)
> 
> 
> I know pacemaker uses shm to communicate. permissions on /dev/shm are (I think) correct.

Looks reasonable (now that I understand vcap :-)

> 
> root at 5627a5e1-9e30-4fe2-9178-6445e26a8ccc:~# ls -al /dev/shm/
> total 8224
> drwxrwx---  2 root vcap      80 2013-04-26 10:30 .
> drwxr-xr-x 12 root root    3900 2013-04-26 08:23 ..
> -rw-------  1 root root 8388608 2013-04-26 10:30 qb-corosync-blackbox-data
> -rw-------  1 root root    8248 2013-04-26 10:28 qb-corosync-blackbox-header
> 
> When I changed permissions on /dev/shm to 777 - things get a little further - CIB stays up, crmd respawns, and I get this over and over again in the logs.
> 
> ##################################
> Apr 26 10:55:52 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 id=95b6eca5-a34e-49e5-b0f8-74b84857d690
> Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 id=117e515b-da4d-4842-9414-7b7d004e5c92
> Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 id=cf7c10b1-14a1-47d1-9e2e-30707254256f
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    error: pcmk_child_exit:      Child process crmd exited (pid=5775, rc=2)

No logs from the crmd?

> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_node_processes:        Empty uname for node 839122954
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    debug: update_node_processes:        Node 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: 00000000000000000000000000111112 (was 00000000000000000000000000111312)
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_process_clients:       Sending process list to 0 children
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_process_peers:         Sending <node uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118482"/>
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:   notice: pcmk_process_exit:    Respawning failed child process: crmd
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:     info: start_child:  Forked child 5789 for process crmd
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_node_processes:        Empty uname for node 839122954
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    debug: update_node_processes:        Node 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: 00000000000000000000000000111312 (was 00000000000000000000000000111112)
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_process_clients:       Sending process list to 0 children
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: update_process_peers:         Sending <node uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118994"/>
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: crm_user_lookup:      Cluster user vcap has uid=1000 gid=1000
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:    trace: mainloop_gio_callback:        New message from corosync-cpg[0x21b1c60]
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 id=5dfb6f5a-8b53-42f6-b5f5-61e49efa93dd
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a636f0 for uid=1000 gid=0 pid=5789 id=3198d49f-8ff9-4799-9496-1b9aed0de807
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a56cb0 for uid=1000 gid=0 pid=5789 id=2713f990-2533-4fb8-82e0-31e40b1ef577
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a571f0 for uid=1000 gid=0 pid=5789 id=2bf401a2-3bd5-43af-9328-0a53bb61d9f7
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:56:00 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 id=7233fbec-3633-4a48-8fe7-3028bfa58029
> Apr 26 10:56:00 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:56:02 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 id=a7b76888-7137-4eb1-888d-d7a3ea273a4f
> Apr 26 10:56:02 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:56:04 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 id=4fbd695d-902b-4a29-957f-8d36fd072178
> Apr 26 10:56:04 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:56:06 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc       lrmd:     info: crm_client_new:       Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 id=a3e00689-d842-456d-957a-22e2e4e7eedf
> ##################
> 
> SHM while running...
> 
> #####################
> root at 5627a5e1-9e30-4fe2-9178-6445e26a8ccc:~# ls -al /dev/shm/
> total 34936
> drwxrwxrwx  2 root vcap    1280 2013-04-26 10:57 .
> drwxr-xr-x 12 root root    3900 2013-04-26 08:23 ..
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cfg-event-5598-5754-16-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cfg-event-5598-5754-16-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cfg-request-5598-5754-16-data
> -rw-------  1 root root    8252 2013-04-26 10:54 qb-cfg-request-5598-5754-16-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cfg-response-5598-5754-16-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cfg-response-5598-5754-16-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-event-5756-5757-9-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:54 qb-cib_rw-event-5756-5757-9-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-event-5756-5759-10-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:54 qb-cib_rw-event-5756-5759-10-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-request-5756-5757-9-data
> -rw-rw----  1 vcap root    8252 2013-04-26 10:54 qb-cib_rw-request-5756-5757-9-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-request-5756-5759-10-data
> -rw-rw----  1 vcap root    8252 2013-04-26 10:54 qb-cib_rw-request-5756-5759-10-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-response-5756-5757-9-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:54 qb-cib_rw-response-5756-5757-9-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:54 qb-cib_rw-response-5756-5759-10-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:54 qb-cib_rw-response-5756-5759-10-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:56 qb-cib_shm-event-5756-5808-7-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:56 qb-cib_shm-event-5756-5808-7-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:56 qb-cib_shm-request-5756-5808-7-data
> -rw-rw----  1 vcap root    8252 2013-04-26 10:56 qb-cib_shm-request-5756-5808-7-header
> -rw-rw----  1 vcap root  524288 2013-04-26 10:56 qb-cib_shm-response-5756-5808-7-data
> -rw-rw----  1 vcap root    8248 2013-04-26 10:56 qb-cib_shm-response-5756-5808-7-header
> -rw-------  1 root root 8388608 2013-04-26 10:56 qb-corosync-blackbox-data
> -rw-------  1 root root    8248 2013-04-26 10:47 qb-corosync-blackbox-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-event-5598-5754-17-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cpg-event-5598-5754-17-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-event-5598-5756-19-data
> -rw-------  1 vcap root    8248 2013-04-26 10:54 qb-cpg-event-5598-5756-19-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-event-5598-5757-18-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cpg-event-5598-5757-18-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-event-5598-5759-20-data
> -rw-------  1 vcap root    8248 2013-04-26 10:54 qb-cpg-event-5598-5759-20-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-cpg-event-5598-5808-21-data
> -rw-------  1 vcap root    8248 2013-04-26 10:56 qb-cpg-event-5598-5808-21-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-request-5598-5754-17-data
> -rw-------  1 root root    8252 2013-04-26 10:54 qb-cpg-request-5598-5754-17-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-request-5598-5756-19-data
> -rw-------  1 vcap root    8252 2013-04-26 10:54 qb-cpg-request-5598-5756-19-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-request-5598-5757-18-data
> -rw-------  1 root root    8252 2013-04-26 10:54 qb-cpg-request-5598-5757-18-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-request-5598-5759-20-data
> -rw-------  1 vcap root    8252 2013-04-26 10:54 qb-cpg-request-5598-5759-20-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-cpg-request-5598-5808-21-data
> -rw-------  1 vcap root    8252 2013-04-26 10:56 qb-cpg-request-5598-5808-21-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-response-5598-5754-17-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cpg-response-5598-5754-17-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-response-5598-5756-19-data
> -rw-------  1 vcap root    8248 2013-04-26 10:54 qb-cpg-response-5598-5756-19-header
> -rw-------  1 root root 1048576 2013-04-26 10:54 qb-cpg-response-5598-5757-18-data
> -rw-------  1 root root    8248 2013-04-26 10:54 qb-cpg-response-5598-5757-18-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:54 qb-cpg-response-5598-5759-20-data
> -rw-------  1 vcap root    8248 2013-04-26 10:54 qb-cpg-response-5598-5759-20-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-cpg-response-5598-5808-21-data
> -rw-------  1 vcap root    8248 2013-04-26 10:56 qb-cpg-response-5598-5808-21-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-quorum-event-5598-5808-22-data
> -rw-------  1 vcap root    8248 2013-04-26 10:56 qb-quorum-event-5598-5808-22-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-quorum-request-5598-5808-22-data
> -rw-------  1 vcap root    8252 2013-04-26 10:56 qb-quorum-request-5598-5808-22-header
> -rw-------  1 vcap root 1048576 2013-04-26 10:56 qb-quorum-response-5598-5808-22-data
> -rw-------  1 vcap root    8248 2013-04-26 10:56 qb-quorum-response-5598-5808-22-header
> #####################################
> 
> snippets from pacemaker-strace after chmod 777 /dev/shm
> 
> ###################
> CIB
> 5833  chown("/dev/shm/qb-cib_shm-event-5833-5858-7-data", 4294967295, 1000) = -1 EPERM (Operation not permitted)
> 5833  chown("/dev/shm/qb-cib_shm-event-5833-5858-7-header", 4294967295, 1000) = -1 EPERM (Operation not permitted)
> 5833  chmod("/dev/shm/qb-cib_shm-event-5833-5858-7-data", 0660) = 0
> 5833  chmod("/dev/shm/qb-cib_shm-event-5833-5858-7-header", 0660) = 0
> ####################
> CRMD
> 5838  connect(3, {sa_family=AF_FILE, path=@"cib_shm"}, 110) = -1 ECONNREFUSED (Connection refused)
> 5838  close(3)                          = 0
> 5838  shutdown(4294967295, 2 /* send and receive */) = -1 EBADF (Bad file descriptor)
> 5838  close(4294967295)                 = -1 EBADF (Bad file descriptor)
> 5838  write(2, "Could not establish cib_shm conn"..., 65) = 65
> 5838  clock_gettime(CLOCK_REALTIME, {1366973927, 255600506}) = 0
> 5838  munmap(0x7f6c1bcc3000, 528384)    = 0
> #########################
> 
> this is looking more and more like a permissions problem on files read/written on SHM.
> 
> I read  - http://www.ultrabug.fr/pacemaker-vulnerability-and-v1-1-9-release/ - and added root to group vcap, and vcap to group root. ( vcap is my equivalent for haclient user/group) - no change to behavior. I did add "--with-acls" at compile time - but I'm not planning on using them.

Which exact version (git hash) of pacemaker and libqb are you using?



More information about the Pacemaker mailing list