OS-7369: full UART FIFO can send bhyve into a spin

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-11-13T10:13:38.561Z
Updated at:2018-11-26T16:01:37.082Z

People

Created by:Marsell Kukuljevic
Reported by:Marsell Kukuljevic
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-11-26T16:01:30.236Z)

Fix Versions

2018-12-06 Grizz (Release Date: 2018-12-06)

Related Links

Labels

bhyve

Description

When running mdata-get to retrieve long metadata strings, the RDP session sometimes dies. Sometimes it's just RDP, and sometimes all of Windows' networking appears to go on hiatus for a few seconds.

This occurs reasonably reliably on a CN running 20181109T134315Z (and previous versions). Annoyingly, I cannot replicate this locally inside VMware running SmartOS 20181108T015750Z (and previous versions). Given that this involves networking, one notable difference between the two is that the CN is running fabrics, whereas the VMware instance is not.

Running mdata-get to fetch a large value:

PS C:\smartdc\mdata> .\mdata-get root_authorized_keys
plat_recv timeout
receive timeout, resetting protocol...
plat_recv timeout
receive timeout, resetting protocol...
<terminated manually>

In another terminal, running concurrently:

PS C:\Users\Administrator> ping -t 1.1.1.1

Pinging 1.1.1.1 with 32 bytes of data:
Reply from 1.1.1.1: bytes=32 time=3ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59  <- this is about where the mdata-get command was initiated
Request timed out.
Reply from 1.1.1.1: bytes=32 time=293ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Request timed out.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Request timed out.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Reply from 165.225.169.59: Destination host unreachable.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59

Ping statistics for 1.1.1.1:
    Packets: Sent = 94, Received = 84, Lost = 10 (10% loss),
Reply from 1.1.1.1: Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 293ms, Average = 5ms
bytes=32 Control-C

However, although the RDP session regularly dies when mdata emits "receive timeout, resetting protocol", sometimes ping is apparently unaffected. For example, this run:

PS C:\Users\Administrator> ping -t 1.1.1.1

Pinging 1.1.1.1 with 32 bytes of data:
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59  <- where mdata-get was run
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59  <- about where mdata timeout occurred and RDP session was lost
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59  <- roughly where RDP came available again
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59

Ping statistics for 1.1.1.1:
    Packets: Sent = 101, Received = 100, Lost = 1 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 2ms, Average = 1ms
Control-C

There doesn't appear to be anything obvious in the metadata service's logs; the logs are normal:

{"name":"metadata","hostname":"RAA09033","pid":6730,"level":20,"msg":"createServersOnExistingZones for 23 zones","time":"2018-11-13T09:54:04.447Z","v":0}
{"name":"metadata","hostname":"RAA09033","pid":6730,"level":30,"msg":"created zone metadata sockets on 0 / 23 zones took 0.000418055s (418.06us)","time":"2018-11-13T09:54:04.447Z","v":0}
{"name":"metadata","hostname":"RAA09033","pid":6730,"brand":"bhyve","zonename":"9698d9f6-f523-4b5f-e8ca-99960dcf351d","level":30,"response":"SUCCESS","query":{"cmd":"GET","arg":"root_authorized_keys","req_v":2,"elapsed":0.83473},"msg":"handled GET root_authorized_keys","time":"2018-11-13T09:54:31.153Z","v":0}
{"name":"metadata","hostname":"RAA09033","pid":6730,"level":20,"msg":"createServersOnExistingZones for 23 zones","time":"2018-11-13T09:55:04.447Z","v":0}
{"name":"metadata","hostname":"RAA09033","pid":6730,"level":30,"msg":"created zone metadata sockets on 0 / 23 zones took 0.000199646s (199.65us)","time":"2018-11-13T09:55:04.447Z","v":0}

Comments

Comment by Marsell Kukuljevic
Created at 2018-11-13T10:34:42.223Z
Updated at 2018-11-13T11:24:21.646Z

It's not just long values. Running mdata-get in a loop on a small 10-byte key causes serious latency spikes in ping:

Pinging 1.1.1.1 with 32 bytes of data:
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1345ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time=220ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2492ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=454ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=575ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=178ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=313ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=535ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=195ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=344ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2515ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59

One possibility is that Windows' scheduler might be affected. To check, run an mdata-get loop with the following script separately:

function getMs() {
    [Math]::Round((Get-Date).ToFileTime() / 10000)
}

$old = getMs
while ($true) {
    Start-Sleep -seconds 1
    $new = getMs
    $delta = $new - $old
    $old = $new
    $delta
}

The above script repeatedly takes the time, sleeps 1s, then prints how long the 1s sleep really took in milliseconds. The results:

1000
1016
1015
1016
1000
1015
1016
1000
1000
1016
1000
1015
1016
1000
1015
1000
1000
1016
1000
1016
1015
1000
1016
1015
1000
1000
1000
1016
1000
1016
1000
1015
1016
1000
1015
1016
1000
1016
1015
1016
1015
1000
1000
1000
1016
1016
1015
1016
1015    <- where mdata-get loop starts
1016
1110
1015
1016
1000
1000
1071
1153
7338    <-
1012
1004
1175
1012
1016
1015
1000
2215    <-
1004
1016
1000
1205
1137
1188
1142
1066
1152
1032
1273
1242
1235
1015
1000
1023
1008
1000
3894    <-
1012
1016
1015
1001
1015
1000
1177
1011
1000
1015
1016
1468
1001
1000
1015
1019
1053
1022
9937    <-
1016
1015
1147
1044
1057
1174
1137
1028
1107
1019
1133
1014
10829   <-
1030
1008
1151
1013
1015
4814    <-
1014
1000
1000
1016
1016
1143
1015
1024
1073
1010
1291
1116
1015
18543   <-
1019
1000
1016
1000
1363
1012
1000
2121    <-
1004
1015
1016    <- about where mdata-get loop ends
1000
1312
1157
1031
1000
1015
1000
1016
1016
1000
1000
1015
1000
1000
1016
1015
1000
1016
1000
1016
1015
1032
1031
1015
1016
1016
1000
1015
1000
1000
1031
1000
1016
1000
1016
1000
1015
1000

There are some pretty serious latency spikes, up to 18s for a sleep that should take 1s. The time taken for mdata-get tends to correlate fairly closely with the above spikes; alas I didn't keep proper timestamps, and the two time series don't match up properly.

Tangentially related, the VM has four vcpus, but the cpu_shares seem pretty low, especially considering that at least some of the other VMs on that CN have a lot higher cpu_shares.

"cpu_shares": 32,
"cpu_cap": 336,
"vcpus": 4,

Comment by Patrick Mooney [X]
Created at 2018-11-13T17:37:01.141Z

Please include details about if this is KVM or bhyve (or both?) and what type of host you're running the workload on.


Comment by Marsell Kukuljevic
Created at 2018-11-13T17:41:19.588Z

Whoops, added the bhyve label.

By host you mean... what kind of CN in east3b? Is this a label?


Comment by Patrick Mooney [X]
Created at 2018-11-13T17:44:56.619Z

It wasn't immediately clear that this wasn't in nested virt or something. An east-3b CN should be fine for testing. Do similar hangs appear when running under KVM?


Comment by Marsell Kukuljevic
Created at 2018-11-13T17:49:30.681Z

I'll verify it today, but I'm reasonably sure this is specific to bhyve.


Comment by Marsell Kukuljevic
Created at 2018-11-14T09:53:25.192Z
Updated at 2018-11-14T09:54:51.096Z

An attempt to run mdata-get in a loop where it fetches a 1KB value ended up with RDP nearly unconnectable after a few seconds of working. This isn't particularly noteworthy, but the SAC (vmadm console) becomes nearly unusable as well. Most keystrokes take 15s+ to appear.


Comment by Marsell Kukuljevic
Created at 2018-11-14T10:28:24.423Z
Updated at 2018-11-14T11:00:27.791Z

mdata-get under bhyve Ubuntu is far faster, but it too affects networking. I temporarily created a VM on the same CN with the following setup (after verifying the IP was free):

{
  "brand": "bhyve",
  "vcpus": 2,
  "autoboot": false,
  "ram": 2048,
  "disks": [ {
    "boot": true,
    "model": "virtio",
    "image_uuid": "6b47e1d9-36b8-4b6f-8764-5ff5fe6d120b",
    "image_size": 10240
  } ],
  "nics": [ {
    "nic_tag": "external",
    "model": "virtio",
    "network_uuid": "feb7b2c5-0063-42f0-a4e6-b812917397f7",
    "ip": "165.225.169.239",
    "netmask": "255.255.252.0",
    "gateway": "165.225.168.1",
    "vlan_id": 3105
  } ],
  "resolvers": ["1.1.1.1", "1.0.0.1"],
  "customer_metadata": {
    "root_authorized_keys":"ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAgEAowr+bg+nwKWXmsqKMBz6kEdVWcS1ki/9z7lml99IShZJjdqECBECvx9e1aIbkqzAa7I2ZqanBHfrw/xnlQPtVxVVLiUgBt/GT5YPMf/O4Kqh0C1nloErPu/vtGnVTk0/y+4gH1GyenwcORLjJR+eVhf098fkrklyESUm7nqjxPZFQWVRc7W5RbE2Cfgjts1H4ytyEAbL3NDrFSlhgT+ykZ3EHk72AoAg2/JEWG9MyaEp5jKJEOldKgXvQp4O9H62wkGOrONpQweneSqjHnxmYdkfS3pmdsZ42SspFRbkx8Q3zdxyWBtkPKruJTybNT+K4v5fL88Yx/HPazFjM8qvGbPITZgPvcw1XArJGUGFDfCTtIit6i0hHTKf5QaZUdhO5XjF5pmhxskEvc0DFoP1QzxXOo9awRR7BFfFAKTJoduXoBE4yT+OHRni6cCqeTw5tIaRLFLokaaZvzMTjFkM5sxWaxL0y9VE497ykMoq9DbPskyxmSzbXnfSV2VBIHsdRRzg8KmeDUpnwZxqezep1pnTk8kHxJ8JAAPw8qHDP1/NhLe0D9g5I/I1R2ZgshNbDSBtXkCueEiT9u66xlGWeF2Aqu08LmnoGrJ8uYp3rNlyKrFNTNXpbLim6QWHoyJmiRKcOprCXWvNo05Zck4mJXTuYUiAj4cY4f+fNRMbZ2E= marsell@joyent.com"
  }
}

This is the same network as the Windows VM above.

Then in one Ubuntu session:

ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ export TIMEFORMAT=%R  # show real only
ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ while true; do time mdata-get root_authorized_keys > /dev/null; done
0.467
4.345
5.205
6.144
7.208
6.514
9.377
2.164
4.726
5.715
1.424
5.649
2.440
5.064
8.371
5.689
5.607
6.640
5.671
8.284
POLLHUP
receive timeout, resetting protocol...
14.392
5.896
^C8.134

And in a separate terminal:

ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ ping -i 1 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=59 time=0.403 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=59 time=0.403 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=59 time=0.368 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=59 time=0.382 ms    <- start mdata loop
64 bytes from 1.1.1.1: icmp_seq=5 ttl=59 time=0.396 ms
64 bytes from 1.1.1.1: icmp_seq=6 ttl=59 time=0.411 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=59 time=0.400 ms
64 bytes from 1.1.1.1: icmp_seq=8 ttl=59 time=0.402 ms
64 bytes from 1.1.1.1: icmp_seq=9 ttl=59 time=0.390 ms
64 bytes from 1.1.1.1: icmp_seq=10 ttl=59 time=0.381 ms
64 bytes from 1.1.1.1: icmp_seq=11 ttl=59 time=0.382 ms
64 bytes from 1.1.1.1: icmp_seq=12 ttl=59 time=0.325 ms
64 bytes from 1.1.1.1: icmp_seq=13 ttl=59 time=0.290 ms
64 bytes from 1.1.1.1: icmp_seq=14 ttl=59 time=0.281 ms
64 bytes from 1.1.1.1: icmp_seq=15 ttl=59 time=0.304 ms
64 bytes from 1.1.1.1: icmp_seq=16 ttl=59 time=0.465 ms
64 bytes from 1.1.1.1: icmp_seq=17 ttl=59 time=8376 ms    <-
64 bytes from 1.1.1.1: icmp_seq=18 ttl=59 time=7368 ms
64 bytes from 1.1.1.1: icmp_seq=19 ttl=59 time=6360 ms
64 bytes from 1.1.1.1: icmp_seq=20 ttl=59 time=5352 ms
64 bytes from 1.1.1.1: icmp_seq=21 ttl=59 time=4344 ms
64 bytes from 1.1.1.1: icmp_seq=22 ttl=59 time=3336 ms
64 bytes from 1.1.1.1: icmp_seq=23 ttl=59 time=2328 ms
64 bytes from 1.1.1.1: icmp_seq=24 ttl=59 time=1320 ms
64 bytes from 1.1.1.1: icmp_seq=25 ttl=59 time=312 ms
64 bytes from 1.1.1.1: icmp_seq=26 ttl=59 time=0.366 ms
64 bytes from 1.1.1.1: icmp_seq=27 ttl=59 time=0.325 ms
64 bytes from 1.1.1.1: icmp_seq=28 ttl=59 time=0.340 ms
64 bytes from 1.1.1.1: icmp_seq=29 ttl=59 time=0.317 ms
64 bytes from 1.1.1.1: icmp_seq=30 ttl=59 time=0.373 ms
64 bytes from 1.1.1.1: icmp_seq=31 ttl=59 time=0.338 ms
64 bytes from 1.1.1.1: icmp_seq=32 ttl=59 time=0.355 ms
64 bytes from 1.1.1.1: icmp_seq=33 ttl=59 time=0.329 ms
64 bytes from 1.1.1.1: icmp_seq=34 ttl=59 time=0.307 ms
64 bytes from 1.1.1.1: icmp_seq=35 ttl=59 time=0.307 ms
64 bytes from 1.1.1.1: icmp_seq=36 ttl=59 time=0.353 ms
64 bytes from 1.1.1.1: icmp_seq=37 ttl=59 time=0.308 ms
64 bytes from 1.1.1.1: icmp_seq=38 ttl=59 time=0.314 ms
64 bytes from 1.1.1.1: icmp_seq=39 ttl=59 time=884 ms    <-
64 bytes from 1.1.1.1: icmp_seq=40 ttl=59 time=6954 ms    <-
64 bytes from 1.1.1.1: icmp_seq=41 ttl=59 time=5955 ms
64 bytes from 1.1.1.1: icmp_seq=42 ttl=59 time=0.313 ms
64 bytes from 1.1.1.1: icmp_seq=43 ttl=59 time=0.310 ms
64 bytes from 1.1.1.1: icmp_seq=44 ttl=59 time=0.318 ms    <- end mdata loop
64 bytes from 1.1.1.1: icmp_seq=45 ttl=59 time=0.400 ms
64 bytes from 1.1.1.1: icmp_seq=46 ttl=59 time=0.379 ms
64 bytes from 1.1.1.1: icmp_seq=47 ttl=59 time=0.389 ms
64 bytes from 1.1.1.1: icmp_seq=48 ttl=59 time=0.384 ms
64 bytes from 1.1.1.1: icmp_seq=49 ttl=59 time=0.412 ms
64 bytes from 1.1.1.1: icmp_seq=50 ttl=59 time=0.401 ms
64 bytes from 1.1.1.1: icmp_seq=51 ttl=59 time=0.391 ms
64 bytes from 1.1.1.1: icmp_seq=52 ttl=59 time=0.383 ms
64 bytes from 1.1.1.1: icmp_seq=53 ttl=59 time=0.382 ms
64 bytes from 1.1.1.1: icmp_seq=54 ttl=59 time=0.395 ms
64 bytes from 1.1.1.1: icmp_seq=55 ttl=59 time=0.409 ms
64 bytes from 1.1.1.1: icmp_seq=56 ttl=59 time=0.388 ms
64 bytes from 1.1.1.1: icmp_seq=57 ttl=59 time=0.396 ms
64 bytes from 1.1.1.1: icmp_seq=58 ttl=59 time=0.380 ms
64 bytes from 1.1.1.1: icmp_seq=59 ttl=59 time=0.393 ms
64 bytes from 1.1.1.1: icmp_seq=60 ttl=59 time=0.465 ms
64 bytes from 1.1.1.1: icmp_seq=61 ttl=59 time=0.368 ms
64 bytes from 1.1.1.1: icmp_seq=62 ttl=59 time=0.368 ms
64 bytes from 1.1.1.1: icmp_seq=63 ttl=59 time=0.404 ms
64 bytes from 1.1.1.1: icmp_seq=64 ttl=59 time=0.384 ms
64 bytes from 1.1.1.1: icmp_seq=65 ttl=59 time=0.375 ms
64 bytes from 1.1.1.1: icmp_seq=66 ttl=59 time=0.392 ms
64 bytes from 1.1.1.1: icmp_seq=67 ttl=59 time=0.386 ms
64 bytes from 1.1.1.1: icmp_seq=68 ttl=59 time=0.388 ms
64 bytes from 1.1.1.1: icmp_seq=69 ttl=59 time=0.383 ms
64 bytes from 1.1.1.1: icmp_seq=70 ttl=59 time=0.342 ms
64 bytes from 1.1.1.1: icmp_seq=71 ttl=59 time=0.390 ms
64 bytes from 1.1.1.1: icmp_seq=72 ttl=59 time=0.401 ms
64 bytes from 1.1.1.1: icmp_seq=73 ttl=59 time=0.393 ms
64 bytes from 1.1.1.1: icmp_seq=74 ttl=59 time=0.386 ms
64 bytes from 1.1.1.1: icmp_seq=75 ttl=59 time=0.371 ms
^C
--- 1.1.1.1 ping statistics ---
75 packets transmitted, 75 received, 0% packet loss, time 176201ms
rtt min/avg/max/mdev = 0.281/705.540/8376.192/1925.570 ms, pipe 9

The timers in Ubuntu are far less affected than in Windows, but spikes occasionally happen. Since Perl was available in the image by default, I used the following program:

use strict;
use Time::HiRes qw(time);

my $old = time;
while (1) {
    sleep(1);
    my $new = time;
    printf "%.3f\n", $new - $old;
    $old = $new;
}

The results were the following with mdata looping on a ~700byte value in the background:

1.000
1.000
1.000
1.000
1.000
1.000
1.001
1.000
1.001
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000    <- start mdata loop
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.055    <-
1.000
1.000
1.325    <-
1.000
1.000
1.001
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
5.917    <-
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.498    <-
1.000
1.000
1.000
1.000
1.000
1.000
1.000        <- end of mdata loop
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000

I have left the Perl program running for several minutes without mdata running in the background, and the value is never larger than 1.001. This includes when running the following in the background:

while true; do ls -lah /bin > /dev/null; done
while true; do curl -s desacci.org 1> /dev/null; done

It doesn't deviate from 1.001, nor is ping affected, when looping mdata-get on a small (ten byte) value.


Comment by Marsell Kukuljevic
Created at 2018-11-14T11:18:09.493Z
Updated at 2018-11-14T11:19:14.212Z

Ubuntu under bhyve really doesn't like large metadata values (in this case, 4KB):

ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ mdata-put foo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ mdata-get foo
POLLHUP
receive timeout, resetting protocol...
plat_recv timeout
plat_recv timeout
plat_recv timeout
receive timeout, resetting protocol...
plat_recv timeout
plat_recv timeout
receive timeout, resetting protocol...
^C

Keystrokes over SSH were slow, so networking was affected. vmadm console was worse -- keystrokes took 30s+ to register.

When mdata-get foo works, it's very slow:

ubuntu@b31b3f09-236a-e856-8d58-9cc06aa1499b:~$ while true; do time mdata-get foo > /dev/null; done
25.890
20.950
25.875
21.623
POLLHUP
receive timeout, resetting protocol...
WHAT, NO DATA?!
WHAT, NO DATA?!
WHAT, NO DATA?!
56.202
^C42.447

It should not take 20s+ for 4KB data.


Comment by Marsell Kukuljevic
Created at 2018-11-15T11:24:36.325Z
Updated at 2018-11-15T11:25:23.071Z

bhyve Windows:

The following results are solely for mdata-get of ten bytes. I did not test 4K.

Concurrent ping:

Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59 <- mdata loop start
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 1.1.1.1: bytes=32 time=2766ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time=379ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2906ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=15692ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=3609ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=972ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2951ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 165.225.169.239: Destination host unreachable.
Reply from 1.1.1.1: bytes=32 time=520ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=186ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2193ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=3532ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Request timed out.
Reply from 1.1.1.1: bytes=32 time=904ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1814ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Request timed out.
Request timed out.
Reply from 1.1.1.1: bytes=32 time=694ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=2ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59  <- mdata loop start
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59

Concurrent scheduling latency test:

1000
1016
1015
1000
1016
1000
1000
1015
1000
1016
1000
1016 <- mdata loop start
1086
1073
1670
1041
1002
4339
1013
1010
1012
1016
1036
1035
1022
1023
1002
1024
1024
4471
1008
1029
1015
1004
1038
1004
1013
1012
1024
1006
1010
1017
1018
1021
1018
1027
1037
1046
1030
1024
1030
1062
1017
1290
1020
1682
1074
1005
1020
34968
1000
1000
2970
5017
1186
1031
1044
1024
1089
1014
1875
1026
1010
1264
1013
4205
1037
8510
5504
1651
5075
2333
2835
1053
1051
3574
1016
1001
1056
1005
1011
1005
2898
1601
1175
1544
3669
1880
1561
1016
1015
1000
1020
1324
1016
1040
1004
1035
1014
1016
1000
1075
1006
1002
1005
1033
1015
1023
1007
1027
1017
1002
1019
1031
1028
1021
1040
1006
1043
1033
1018
1013
1047
1004
1013
1042
1018
1006
1001
1030
1006
1039
1012
1043
1005
1019
1021
1017
1003
1016
1007
1012
1025
1050
1006
1015
1001
4591
1451
1474
1015
1023
1239
1005
1062
1029
1025
1000
1006
1020
1019
1013
1036
1022
1029
1029
1007
1009
1008
1038
1011
1012
1007
1050
1038
1012
1001
1046
1052
1339
2007
5640
1133
1067
1067
1088
8285
2369
3391
1000
1000
1000
1012
1047
1041
1012
1033
1011
1010
1002
1029
1032
1036
1034
1014
1000
1016
1031
1017
1034
1049
1023
1017
1016
1026
1014
1045
1020
1006
1816
1923
1057
1015
1016
1437
1016
1013
1007
1033
1024
1001
1000
9297
4793
1277
1041
1014
1000
1000
1015
1016
1016
1039
1022
1026
1027
1030
1048
1033
1058
1026
1029
1026
1018
1004
1039
1012
1665
1007
1016
1000
1015
1016
1000
1015
1000
1016
1047  <- mdata loop end
1000
1000
1000
1016
1015
1016
1000
1000
1015

mdata loop used:

1..100 | % { (measure-command { .\mdata-get administrator_pw }).seconds }
0
49
5
0
0
34
0
0
0
0
3
10
0
0
1
0
3
4
1
26
5
0
5
0
4
8
1
0
0
0
0
0
0
0
1
0
0
0
3
0
0
0
0
0
26
17
0
0
0
0
0
7
0
26
3
2
18
5
0
0
0
0
0
13
0
0
0
0
0
12
3
1
0
0
1
0
0
0
4
0
0
8
5
3
0
0
0
0
0
0
0
0
0
0
15
1
0
0
0
0

RDP session dropped repeatedly, and SAC latency increased up to 20s+.


Comment by Marsell Kukuljevic
Created at 2018-11-15T11:26:45.862Z
Updated at 2018-11-15T11:27:17.982Z

KVM Windows:

The following results are for mdata-get of 10 bytes, but the 4K results were identical:

Concurrent ping:

Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59 <- mdata loop start
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59 <- mdata loop end
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time=1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59
Reply from 1.1.1.1: bytes=32 time<1ms TTL=59

Concurrent scheduling latency test:

1000
1016
1000
1015
1000
1016
1015
1016
1000
1000
1016 <- mdata loop start
1000
1015
1000
1016
1000
1016
1015
1000
1016
1000
1015
1000
1016
1016
1000
1015
1000
1016
1000
1000 <- mdata loop end
1016
1000
1015
1016
1015
1000
1000
1016
1016
1000

mdata loop used:

1..100 | % { (measure-command { .\mdata-get administrator_pw }).seconds }
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

RDP session did not drop.

I was unable to test vmadm console, since the KVM Windows image does not have SAC.


Comment by Marsell Kukuljevic
Created at 2018-11-15T11:29:55.369Z

bhyve Ubuntu:

ping and scheduling latency results were unaffected for mdata-get of ten bytes. vmadm console was affected slightly, with latency occasionally reaching 2s.

Upping to 1K did not affect the ping, but did somewhat affect the timer. The following results are for 1K:

Ping results omitted since they showed no deviation.

Concurrent scheduling latency test:

1.000
1.000
1.000
1.001
1.000
1.000
1.000
1.001
1.000
1.000
1.001 <- start mdata loop
1.000
3.162
2.201
3.007
1.000
1.000
1.003
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
6.140
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
4.257
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000  <- end mdata loop
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000

mdata loop used:

$ export TIMEFORMAT=%R
$ while true; do time /usr/sbin/mdata-get foo > /dev/null; done
2.972
2.202
3.007
2.398
2.404
2.396
2.325
2.330
POLLHUP
receive timeout, resetting protocol...
11.535
10.843
9.119
4.811
4.001
POLLHUP
receive timeout, resetting protocol...
20.203
1.999
1.956
2.029
2.350
2.347
2.335
2.369
2.406
2.398
19.574
10.919
2.410
2.375
2.331
2.335
POLLHUP
receive timeout, resetting protocol...
14.097
2.708
2.334
2.351
2.216
2.356
^C4.862

With 1K mdata-gets, vmadm console latency sometimes hit 8s.


Comment by Marsell Kukuljevic
Created at 2018-11-15T11:37:04.297Z

KVM Ubuntu:

The results are effectively a repeat of KVM Windows: for 10 bytes, 1K and 4K, ping and scheduling latency tests showed no deviation. If vmadm console was affected, the latency delta was in the noise.

The only difference between 10 bytes and 4K was that the former mdata-get took about 120ms to execute, while the latter took ~500ms. I don't think this is noteworthy.


Comment by Marsell Kukuljevic
Created at 2018-11-15T11:42:04.766Z

Summary:

Evidence indicates that KVM is unaffected by COM traffic, whereas bhyve is. Windows is affected a lot worse than Ubuntu.

I don't have permissions for dtrace or lockstat on that CN, so stopping here for now.


Comment by Patrick Mooney [X]
Created at 2018-11-15T20:00:20.937Z
Updated at 2018-11-15T20:00:52.199Z

I've been able to reproduce this locally using the win2016 image on a machine lacking APICv capability (Westmere). Even short mdata-get queries are able to induce the behavior. When the metadata request is issued, a clear signal is seen in mpstat:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   14  2104  101  175    0   13    4    0     3    0   1   0  99
  1    0   0    0 616471   67    4   89    1   55    0    91    0 100   0   0
  2    0   0    0   302   69  328    0   11    2    0     6    0   0   0 100
  3    0   0    0    27   13   68    0   21    1    0    22    0   0   0 100
  4    0   0    0     5    2    6    0    2    2    0     4    0   0   0 100
  5    0   0    0     3    1    2    0    1    1    0     1    0   0   0 100
  6    0   0    0    92    1   16   83    5   32    0 1095431   34  55   0  11
  7    0   0    0     9    4    4    0    1    3    0   381    0   0   0 100
  8    0   0    4     9    4    4    0    2    3    0     1    0   0   0 100
  9    0   0    0   125   63  127    0    2    1    0    64    1   0   0  99
 10    0   0    0     9    2   10    0    6    4    0   175    0   0   0 100
 11    0   0    0     4    2    9    0    2    4    0     2    0   0   0 100
 12    0   0    0     5    3    4    0    1    5    0     3    0   0   0 100
 13    0   0    0     5    2    2    0    1    2    0     1    0   0   0 100
 14    0   0    0     7    3   11    0    2    4    0     5    0   0   0 100
 15    0   0    0    22   12   76    0    2    9    0     5    0   0   0 100
 16    0   0    0    12   10    4    0    2    2    0     2    0   0   0 100
 17    0   0    0     3    1    2    0    1    2    0     1    0   0   0 100
 18    0   0    0     3    1    5    0    2    7    0     4    0   0   0 100
 19    0   0    0     7    3    8    0    3    4    0     5    0   0   0 100
 20    0   0    0   135    1    2   92    0    6    0    98    0 100   0   0
 21    0   0    0     3    1    4    0    1    1    0     7    0   0   0 100
 22    0   0    0     3    1    5    0    2    1    0     3    0   0   0 100
 23    0   0    0    17    2   65   10    3   14    0 139772    4   7   0  89

1. A high number of syscalls on one CPU (baring migration)
2. Roughly 1/2 that number of interrupts on another CPU

I confirmed that those interrupts were being delivered on the CPU hosting the vCPU for the instance in question. Those interrupts, of vector 0xa1 (poke_cpu), were causing rapid VMX exits with the expected EXT_INTR reason. Trying to trace the source of the poke_cpu() calls, even with light dtracing, proved to be enough probe effect to disturb the race.


Comment by Patrick Mooney [X]
Created at 2018-11-15T20:18:41.887Z

It proved to be challenging to capture a solid trace of the poke_cpu() side of the problem, since even minor probe effect would disturb the race and cause the COM port to behave normally. That being said, identifying when a trace would cause the issue to disappear is still valuable in narrowing down the investigation. By initiating fbt traces during these times of high interrupts, causing them to immediately subside, and examining the residual results, I was lead to vioapic_pulse_irq:

              vmm`vmmdev_do_ioctl+0x14c0
              vmm`vmm_ioctl+0x12c
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall+0x19f

              libc.so.1`ioctl+0xa
              bhyve`lpc_uart_intr_assert+0x2c
              bhyve`uart_toggle_intr+0x29
              bhyve`uart_sock_drain+0x7d
              bhyve`uart_sock_client_event+0x18
              bhyve`uart_sock_thread+0x71
              libc.so.1`_thrp_setup+0x8a
              libc.so.1`_lwp_start
             3670

For a fraction of a second, that's still a surprising number of interrupts for a tiny UART transfer. Looking at uart_sock_drain, we get a clearer picture of what appears to be happening:

        if ((sc->mcr & MCR_LOOPBACK) != 0) {
                (void) read(sc->usc_sock.clifd, &ch, 1);
        } else {
                while (rxfifo_available(sc)) {
                        nbytes = read(sc->usc_sock.clifd, &ch, 1);
                        if (nbytes == 0) {
                                ret = 1;
                                break;
                        }
                        if (nbytes == -1 &&
                            errno != EINTR && errno != EAGAIN) {
                                ret = -1;
                                break;
                        }
                        if (nbytes == -1) {
                                break;
                        }

                        rxfifo_putchar(sc, ch);
                }
                uart_toggle_intr(sc);
        }

If the FIFO happens to be full, a socket read is understandably skipped, but the uart_toggle_intr() call is still made. So any time we enter uart_sock_drain while the fifo is full, regardless of data on the socket, an interrupt request is made. Why might we repeatedly call into this code? The answer is in uart_sock_thread:

        for (;;) {
                pollfds[0].fd = sc->usc_sock.clifd;
                pollfds[1].fd = sc->usc_sock.servfd;
                pollfds[0].revents = pollfds[1].revents = 0;

                res = poll(pollfds,
                    sizeof (pollfds) / sizeof (struct pollfd), -1);

                if (res == -1 && errno != EINTR) {
                        perror("poll failed");
                        /* we are hosed, close connection */
                        break;
                }

                /* event from client side */
                if (pollfds[0].revents) {
                        if (pollfds[0].revents &
                            (POLLIN | POLLRDNORM | POLLRDBAND | POLLPRI)) {
                                uart_sock_client_event(sc);
                        } else {
                                break;
                        }
                }
...
        }

If there's readable data on the socket, we'll attempt to drain it via uart_sock_client_event() -> uart_sock_drain(). This loop is totally ignorant of the FIFO status, so if data is waiting on the socket, we'll get the associated readable events every time. It's likely these repeated interrupts don't cause such nasty behavior on APICv-capable machines since posted interrupts don't require the expensive VMX exit/entry process.


Comment by Patrick Mooney [X]
Created at 2018-11-16T20:22:26.606Z

I tested this on a non-APICv host to replicate the conditions observed when the problem was manifesting itself. With the patch applied, there was no longer any tendency to spin on the UART interrupts. Additionally, pid-probe traces of the rxfifo_available() function showed bhyve immediately reacting to the full buffer condition instead of hammering the poll/interrupt loop.


Comment by Jira Bot
Created at 2018-11-26T16:01:37.082Z

illumos-joyent commit bb7f525820fb9ee32d8b59ed1b4c53c0624d8598 (branch master, by Patrick Mooney)

OS-7369 full UART FIFO can send bhyve into a spin
OS-7381 bhyve should handle UART socket reconnect
Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
Reviewed by: Mike Gerdts <mike.gerdts@joyent.com>
Approved by: Mike Gerdts <mike.gerdts@joyent.com>