OS-6976: vmm_zsd_lock not held while element removed from vmm_zsd_list

Details

Issue Type:Bug
Priority:2 - Critical
Status:Resolved
Created at:2018-05-24T22:59:29.464Z
Updated at:2018-05-29T16:21:12.036Z

People

Created by:Mike Gerdts [X]
Reported by:Mike Gerdts [X]
Assigned to:Mike Gerdts [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-05-25T17:31:27.123Z)

Fix Versions

2018-06-07 Train Graveyard (Release Date: 2018-06-07)

Related Links

Labels

bhyve

Description

During zone destruction, a race can cause corruption due to vmm_zsd_lock not being held while removing the zsd from vmm_zsd_list.

thoth debug 92d51b9d8aa4950faea7e0a2dc77f6a0 sessions can be found in SWSUP-1205 and in ~incindents-jpc.

> We have a corrupted DIFO:

> ffffd066d6f93a40::print -a dtrace_difo_t
ffffd066d6f93a40 {
    ffffd066d6f93a40 dtdo_buf = 0xffffd068d70eeba8
    ffffd066d6f93a48 dtdo_inttab = 0xffffd066d7b0d010
    ffffd066d6f93a50 dtdo_strtab = 0xffffd066d16cb7c0 ""
    ffffd066d6f93a58 dtdo_vartab = 0xffffd063649c5268
    ffffd066d6f93a60 dtdo_len = 0xc04730b0
    ffffd066d6f93a64 dtdo_intlen = 0xffffffff
    ffffd066d6f93a68 dtdo_strlen = 0x3
    ffffd066d6f93a6c dtdo_varlen = 0x1
    ffffd066d6f93a70 dtdo_rtype = {
        ffffd066d6f93a70 dtdt_kind = 0
        ffffd066d6f93a71 dtdt_ckind = 0
        ffffd066d6f93a72 dtdt_flags = 0
        ffffd066d6f93a73 dtdt_pad = 0
        ffffd066d6f93a74 dtdt_size = 0
    }
    ffffd066d6f93a78 dtdo_refcnt = 0
    ffffd066d6f93a7c dtdo_destructive = 0
}

> This is thoth dump 92d51b9d8aa4950faea7e0a2dc77f6a0
> As it turns out, ffffd066d6f93a60 has been corrupted with the value ffffffffc04730b0

> ffffffffc04730b0/K
vmm_zsd_list+0x10:              ffffd0631fc44ae0 

> It did get stomped on by a pointer.
> Oh, interesting:

> vmm_zsd_list::walk list
0xffffd0631fc44ac0
0xffffd063656d4480
0xffffd063656ca140
0xffffd06362f43380
0xffffd063847a3440
0xffffd06368fd5140
0xffffd063847a8940
0xffffd066d61a7240
0xffffd066d4eb2c00
0xffffd066cee36940
0xffffd066d9c12ec0
0xffffd066daa31100
0xffffd066cefc2bc0
0xffffd066da903c80
0xffffd066d912b8c0
0xffffd066d90f1f40
0xffffd066d4cf5d00
0xffffd066daf146c0
0xffffd066d90d2b80
0xffffd066d58248c0
0xffffd0632bc5a000
0xffffd063698e9740
0xffffd0632bdf6100
0xffffd066cee93140
0xffffd066d93c1140
0xffffd06a24fde380
mdb: failed to read list element at 0xffffffe5: no mapping for address

Comments

Comment by Mike Gerdts [X]
Created at 2018-05-25T13:24:24.006Z

I was able to reproduce with three instances of this script, each bouncing a different zone. It reproduced after the sum was about 14,000.

#! /bin/ksh

integer i=0

while true; do
        let i++
        echo $1 $i
        zoneadm -z $1 ready
        zoneadm -z $1 halt
done

With the fix, same same three zones have gone through about 387,000 iterations (so far) without reproducing.


Comment by Jira Bot
Created at 2018-05-25T16:32:08.717Z

illumos-joyent commit 66904fd497a76f6f9811bbdd9ae5bc32944c8045 (branch master, by Mike Gerdts)

OS-6976 vmm_zsd_lock not held while element removed from vmm_zsd_list
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Bryan Cantrill <bryan@joyent.com>


Comment by Jira Bot
Created at 2018-05-25T17:16:21.618Z

illumos-joyent commit 06c3d5ed0686a9d098bb8c920cffe0bc314699e1 (branch release-20180315, by Mike Gerdts)

OS-6976 vmm_zsd_lock not held while element removed from vmm_zsd_list
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Bryan Cantrill <bryan@joyent.com>


Comment by Trent Mick [X]
Created at 2018-05-26T05:23:16.842Z

@angela The build https://jenkins.joyent.us/job/platform/6134/ is complete: image 859738a7-0c81-45ea-bb91-cef76e724ded. Do we want that added to the 'release' channel?


Comment by Trent Mick [X]
Created at 2018-05-29T16:21:12.036Z

^^^ platform build timestamp is 20180525T172343Z

Now added to the "release" channel:

$ updates-imgadm -C staging channel-add release 859738a7-0c81-45ea-bb91-cef76e724ded
Added image 859738a7-0c81-45ea-bb91-cef76e724ded (platform@release-20180315-20180525T172343Z) to "release" channel