[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets
Octavian Ciobanu
coctavian1979 at gmail.com
Sat Aug 26 08:41:55 EDT 2017
Hey John.
I also encountered the same error message "ERROR: This Target already
exists in configFS" a while back and when I issued targetcli and listed it
configuration contents I could see the target in iscsi folder. And that was
due to a force reboot of the node.
To solve it I've made an workaround by adding the following line "ocf_run
targetcli /iscsi delete ${OCF_RESKEY_iqn}" in
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget at line 330 just before
"ocf_run targetcli /iscsi create ${OCF_RESKEY_iqn} || exit
$OCF_ERR_GENERIC". That command will delete the target to be created if
already exists.
I hope this workaround will help you with your issue until a valid solution
is available.
Best regards
Octavian Ciobanu
On Tue, Aug 22, 2017 at 12:19 AM, John Keates <john at keates.nl> wrote:
> Hi,
>
> I have a strange issue where LIO-T based ISCSI targets and LUNs most of
> the time simply don’t work. They either don’t start, or bounce around until
> no more nodes are tried.
> The less-than-usefull information on the logs is like:
>
> Aug 21 22:49:06 [10531] storage-1-prod pengine: warning:
> check_migration_threshold: Forcing iscsi0-target away from storage-1-prod
> after 1000000 failures (max=1000000)
>
> Aug 21 22:54:47 storage-1-prod crmd[2757]: notice: Result of start
> operation for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok)
> Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]: WARNING:
> Configuration parameter "tid" is not supported by the iSCSI implementation
> and will be ignored.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
> Parameter auto_add_default_portal is now 'false'.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
> Created target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR:
> This Target already exists in configFS
> Aug 21 22:54:48 storage-1-prod crmd[2757]: notice: Result of start
> operation for iscsi0-target on storage-1-prod: 1 (unknown error)
> Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO:
> Deleted Target iqn.2017-08.access.net:prod-1-ha.
> Aug 21 22:54:49 storage-1-prod crmd[2757]: notice: Result of stop
> operation for iscsi0-target on storage-1-prod: 0 (ok)
>
> Now, the unknown error seems to actually be a targetcli type of error:
> "This Target already exists in configFS”. Checking with targetcli shows
> zero configured items on either node.
> Manually starting the LUNs and target gives:
>
>
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target
> Error performing operation: Operation not permitted
> Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1
> > stderr: WARNING: Configuration parameter "tid" is not supported by the
> iSCSI implementation and will be ignored.
> > stderr: INFO: Parameter auto_add_default_portal is now 'false'.
> > stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha.
> Created TPG 1.
> > stderr: ERROR: This Target already exists in configFS
>
> but now targetcli shows at least the target. Checking with crm status
> still shows the target as stopped.
> Manually starting the LUNs gives:
>
>
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0
> Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit) returned
> 0
> > stderr: INFO: Created block storage object iscsi0-lun0 using
> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root.
> > stderr: INFO: Created LUN 0.
> > stderr: DEBUG: iscsi0-lun0 start : 0
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1
> Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit) returned
> 0
> > stderr: INFO: Created block storage object iscsi0-lun1 using
> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap.
> > stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 378:
> /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial:
> No such file or directory
> > stderr: INFO: Created LUN 1.
> > stderr: DEBUG: iscsi0-lun1 start : 0
>
> So the second LUN seems to have some bad parameters created by the
> iSCSILogicalUnit script. Checking with targetcli however shows both LUNs
> and the target up and running.
> Checking again with crm status (and pcs status) shows all three resources
> still stopped. Since LUNs are colocated with the target and the target
> still has fail counts, I clear them with:
>
> sudo pcs resource cleanup iscsi0-target
>
> Now the LUNs and target are all active in crm status / pcs status. But
> it’s quite a manual process to get this to work! I’m thinking either my
> configuration is bad or there is some bug somewhere in targetcli / LIO or
> the iSCSI heartbeat script.
> On top of all the manual work, it still breaks on any action. A move,
> failover, reboot etc. instantly breaks it. Everything else (the underlying
> ZFS Pool, the DRBD device, the IPv4 IP’s etc) moves just fine, it’s only
> the ISCSI that’s being problematic.
>
> Concrete questions:
>
> - Is my config bad?
> - Is there a known issue with ISCSI? (I have only found old references
> about ordering)
>
> I have added the output of crm config show as cib.txt and the output of a
> fresh boot of both nodes is:
>
> Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with quorum
> Last updated: Mon Aug 21 22:55:05 2017
> Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on
> storage-1-prod
>
> 2 nodes configured
> 21 resources configured
>
> Online: [ storage-1-prod storage-2-prod ]
>
> Full list of resources:
>
> ip-iscsi0-vlan10 (ocf::heartbeat:IPaddr2): Started
> storage-1-prod
> ip-iscsi0-vlan20 (ocf::heartbeat:IPaddr2): Started
> storage-1-prod
> ip-iscsi0-vlan30 (ocf::heartbeat:IPaddr2): Started
> storage-1-prod
> ip-iscsi0-vlan40 (ocf::heartbeat:IPaddr2): Started
> storage-1-prod
> Master/Slave Set: drbd_master_slave0 [drbd_disk0]
> Masters: [ storage-1-prod ]
> Slaves: [ storage-2-prod ]
> Master/Slave Set: drbd_master_slave1 [drbd_disk1]
> Masters: [ storage-2-prod ]
> Slaves: [ storage-1-prod ]
> ip-iscsi1-vlan10 (ocf::heartbeat:IPaddr2): Started
> storage-2-prod
> ip-iscsi1-vlan20 (ocf::heartbeat:IPaddr2): Started
> storage-2-prod
> ip-iscsi1-vlan30 (ocf::heartbeat:IPaddr2): Started
> storage-2-prod
> ip-iscsi1-vlan40 (ocf::heartbeat:IPaddr2): Started
> storage-2-prod
> st-storage-1-prod (stonith:meatware): Started storage-2-prod
> st-storage-2-prod (stonith:meatware): Started storage-1-prod
> zfs-iscsipool0 (ocf::heartbeat:ZFS): Started storage-1-prod
> zfs-iscsipool1 (ocf::heartbeat:ZFS): Started storage-2-prod
> iscsi0-lun0 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> iscsi0-lun1 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> iscsi0-target (ocf::heartbeat:iSCSITarget): Stopped
> Clone Set: dlm-clone [dlm]
> Started: [ storage-1-prod storage-2-prod ]
>
> Failed Actions:
> * iscsi0-target_start_0 on storage-2-prod 'unknown error' (1): call=99,
> status=complete, exitreason='none',
> last-rc-change='Mon Aug 21 22:54:49 2017', queued=0ms, exec=954ms
> * iscsi0-target_start_0 on storage-1-prod 'unknown error' (1): call=98,
> status=complete, exitreason='none',
> last-rc-change='Mon Aug 21 22:54:47 2017', queued=0ms, exec=1062ms
>
> Regards,
> John
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170826/eac037fa/attachment-0003.html>
More information about the Users
mailing list