[Pacemaker] Patches: RFC before pull request
Lars Ellenberg
lars.ellenberg at linbit.com
Tue Dec 9 14:33:29 UTC 2014
Andrew,
All,
Please have a look at the patches I queued up here:
https://github.com/lge/pacemaker/commits/for-beekhof
Most (not all) are specific for the heartbeat cluster stack.
Thanks,
Lars
A few comments here:
-----
This effectively changes crm_mon output,
but also changes logging where this method is invoked:
Low: native_print: report target-role as well
This is for the "Why does my resource not start?" guys who
forgot to remove the limiting target-role setting.
Report target role (unless "Started", which is the default anyways),
if it limits our abilities (Slave, Stopped),
or if it differs from the current status.
-----
Heartbeat specific:
Low: allow heartbeat to spawn the pengine itself, and tell crmd about it
Heartbeat 3.0.6 now may spawn the pengine directly, and will announce
this in the environment -- I introduced the setting "crmd_spawns_pengine".
This improves shutdown behavior. Otherwise I regularly find an orphaned
pengine process after pacemaker shutdown.
-----
Heartbeat specific, as consequence of the fix blow:
Low: add debugging aid to help spot missing set_msg_callback()s on heartbeat
In ha_msg_dispatch(), change from rcvmsg() to readmsg().
rcvmsg() is internally simply a wrapper around readmsg(),
which silently deletes messages without matching callback.
Use readmsg() directly here. It will only return unprocessed (by
callbacks) messages, so log a warning, notice or debug message
depending on message header information, and ha_msg_del() it ourselves.
-----
Heartbeat specific bug fix:
High: fix stonith ignoring its own messages on heartbeat
Since the introduction of the additional F_TYPE messages
T_STONITH_NOTIFY and T_STONITH_TIMEOUT_VALUE, and their use as message
types in global heartbeat cluster messages, stonith-ng was broken on the
heartbeat cluster stack.
When delegation was made the default, and the result could only be
reaped by listening for the T_STONITH_NOTIFY message, no-one (but
stonithd itself) would ever notice successful completion,
and stonith would be re-issued forever.
Registering callbacks for these F_TYPE fixes these hung stonith and
stonith_admin operations on the heartbeat cluster stack.
-----
Heartbeat specific:
Medium: fix tracking of peer client process status on heartbeat
Don't optimistically assume that peer client processes are alive,
or that a node that can talk to us is in fact member of the same
ccm partition.
Whenever ccm tells us about a new membership, *ask* for peer client
process status.
-----
This oneliner may well be relevant for corosync CPG as well,
possibly one of the reasons the pcmk_cpg_membership() has this funny
"appears to be online even though we think it is dead" block?
fix crm_update_peer_proc to NOT ignore flags if partially set
The "set_bit()" function used here actually deals with masks, not bit numbers.
The "flag" argument should in fact be plural: flags.
These proc flag bits are not always set one at a time,
but for example as "crm_proc_crmd | crm_proc_cpg",
and not necessarily cleared with the same combination.
Ignoring to-be-set flags just because *some* of the flag bits are
already set is clearly a bug, and may be the reason for stale process
cache information.
-----
Heartbeat specific:
Medium: map heartbeat JOIN/LEAVE status to ONLINE/OFFLINE
The rest of the code deals in "online" and "offline",
not "join" and "leave". Need to map these states,
or the rest of the code won't work properly.
-----
Generic, if shutdown is requested before stonith connection was ever established
(due to other problems), inisting to re-try the stonith connection confused the shutdown.
Medium: don't trigger a stonith_reconnect if no longer required
Get rid of some spurious error messages, and speed up shutdown,
even if the connection to the stonith daemon failed.
-----
Non-functional change, just for readability:
Low: use CRM_NODE_MEMBER, not CRM_NODE_ACTIVE
ACTIVE is defined to be MEMBER anyways:
include/crm/cluster.h:#define CRM_NODE_ACTIVE CRM_NODE_MEMBER
Don't confuse the reader of the code
by implying it was something different.
-----
Heartbeat specific, packaging only:
Low: heartbeat 3.0.6 knows to finds the daemons; drop compat symlinks
More information about the Pacemaker
mailing list