[ClusterLabs] attrd/attrd_updater asynchronous behavior

Mon Apr 16 17:28:39 EDT 2018

Hi,

I have a question in regard with attrd asynchronous behavior

In PAF, during the election process to pick the best PgSQL master, we are using
private attributes to publish the status (LSN) of each pgsql instances during
the pre-promote action.

Because we need these LSN from each nodes during the promote action, each time
we are calling

  attrd_updater --name blah --update x

we have a loop running

  attrd_updater --name blah --query

until the fetched value is the same than the one we set. We basically tried to
force a synchronous behavior.

See: https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L310

But, we have an issue on github that makes me think this might not be enough to
make sure all the private attributes becomes available among the
cluster during the pre-promote action and before the promote action is
triggered. See: https://github.com/ClusterLabs/PAF/issues/131

In this issue, a simple switchover fails (pcs move) during the designated slave
promotion action, because it couldn't check all other nodes LSN: 

  ocf-exit-reason:Can not get LSN location for "pg1-dev"

* does looping until the value becomes available is enough to conclude all
  other node have the same value? Or is available only locally on the action's
  node and not yet "replicated" to other nodes? 
* any other suggestions about how we could share values synchronously with all
  other nodes?

Thanks for your help,