OS-7350: vmadm delete cannot clean up a zone in "configured" state

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-11-04T21:53:34.381Z
Updated at:2019-02-15T18:33:14.884Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-02-15T18:32:48.748Z)

Fix Versions

2019-02-28 Mind Grapes (Release Date: 2019-02-28)

Related Links

Description

From time to time I have seen zones in the configured state. How they got there, I don't know. However, I do know that the process to clean them up is not very intuitive to a SmartOS user.

Apparently, I'm not the only one to see this. See OS-6523. Also mentioned on IRC in #smartos today:

[14:03:42]  <nfg243>	How do i delete a zone that is stuck in the state "provisioning" ?
[14:04:24]  <nfg243>	There is no zoneadmd associated with the failed zone in the process list, vmadm delete <zone> and vmadm kill <zone> are not getting rid of it
[14:23:51]  <nfg243>	vmadm just gives me this error
[14:23:52]  <nfg243>	Failed to delete VM 48211f69-3048-4577-ebcb-89145569efa3: first of 1 error: Command failed: zoneadm: zone '48211f69-3048-4577-ebcb-89145569efa3': is already in state 'configured'.
[14:24:05]  <nfg243>	Cannot for the life of me figure out how to get rid of it, already tried rebooting

I first started seeing this after vminfod landed. I don't think that vminfod is getting in the way of the delete, but I suspect that it is not cleaning up provision failures in all cases. Regardless, it seems quite reasonable for vmadm delete to clean up a zone in the configured state.

Comments

Comment by Former user
Created at 2019-01-02T15:54:42.082Z
Updated at 2019-01-03T19:57:09.494Z

From smartos-live#819:

If you try creating an OS VM that has a lofs mount with a source path that doesn't exist in the GZ, vmadm create fails but doesn't cleanly unwind the half-created zone. You end up with a zone that vmadm cannot use, but also cannot delete, and you have to go use zonecfgzfs destroy and zoneadm to clean up the leftover state.


Comment by Former user
Created at 2019-02-15T18:05:45.213Z

CR 5594 created

patchset 1 tested. All tests (except test-bhyve-pci_slot.js, explained below) passing, including the new tests:

# test deleting "configured" VM
ok 101 error creating VM
ok 102 found expected error message: could not verify fs /foo: could not access /this/path/does/not/exist/nor/should/it
ok 103 VM uuid found: bcb748e8-f351-416f-b965-d84980d53f4c
ok 104 VM.load bcb748e8-f351-416f-b965-d84980d53f4c: success
ok 105 VM in state configured
ok 106 VM.delete bcb748e8-f351-416f-b965-d84980d53f4c: success
#  TEST COMPLETE IN 4449 SECONDS, SUMMARY:
#
# PASS: 4950 / 4956
# FAIL: 6 / 4956
#
#  ** FAILED TESTS **
#  /usr/vm/test/tests/test-bhyve-pci_slot.js
#
# log files available in: /tmp/vmtest.1550175690.12337

My COAL is currently experiencing issues that is causing this specific bhyve test to fail - this was happening before I tested this change and is a separate issue.


Comment by Jira Bot
Created at 2019-02-15T18:32:16.608Z

smartos-live commit 1633dfcadffd609293a38d9aac697db2fa6e7f90 (branch master, by Dave Eddy)

OS-7350 vmadm delete cannot clean up a zone in "configured" state
Reviewed by: Mike Gerdts <mike.gerdts@joyent.com>
Approved by: Josh Wilsdon <josh@wilsdon.ca>


Comment by Former user
Created at 2019-02-15T18:33:14.884Z

The above commit should fix this issue. If the issue persists please feel free to open a new ticket and link this ticket.