OS-6611: bhyve guest fails to boot: UUID=... does not exist. Dropping to a shell!

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-02-14T23:19:03.745Z)

Related Issues

Description

This is one of two bugs originally reported in OS-6604. This bug is:

[    0.000000] random: get_random_bytes called from start_kernel+0x42/0x4e6 with crng_init=0
[    0.000000] Linux version 4.13.0-25-generic (buildd@lgw01-amd64-039) (gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3)) #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 (Ubuntu 4.13.0-25.29-generic 4.13.13)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0-25-generic root=UUID=e792c995-2eba-45bb-9f91-46d2af71fb10 ro console=tty0 console=ttyS0,115200n8 tsc=reliable earlyprintk
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] Disabled fast string operations
...
[    8.343314] raid6: using avx2x2 recovery algorithm
[    8.348192] xor: automatically using best checksumming function   avx
[    8.356220] async_tx: api initialized (async)
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [    8.456092] Btrfs loaded, crc32c=crc32c-intel
Scanning for Btrfs filesystems
done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found.
done.
Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found.
done.
...
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  UUID=e792c995-2eba-45bb-9f91-46d2af71fb10 does not exist.  Dropping to a shell!


BusyBox v1.22.1 (Ubuntu 1:1.22.0-19ubuntu2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

Comments

Comment by Mike Gerdts
Created at 2018-02-14T16:30:36.636Z
In @mike.zeller's case, vmadm showed:

  "disks": [
    {
      "path": "/dev/zvol/rdsk/zones/eada85b5-3fab-e144-9f2b-95d78e977fdf-disk0",
      "boot": false,
      "model": "virtio",
      "media": "disk",
      "image_size": 10240,
      "image_uuid": "38396fc7-2472-416b-e61b-d833b32bd088",
      "zfs_filesystem": "zones/eada85b5-3fab-e144-9f2b-95d78e977fdf-disk0",
      "zpool": "zones",
      "size": 10240,
      "compression": "off",
      "refreservation": 10240,
      "block_size": 4096
    },
    {
      "path": "/dev/zvol/rdsk/zones/eada85b5-3fab-e144-9f2b-95d78e977fdf-disk1",
      "boot": false,
      "model": "virtio",
      "media": "disk",
      "zfs_filesystem": "zones/eada85b5-3fab-e144-9f2b-95d78e977fdf-disk1",
      "zpool": "zones",
      "size": 25600,
      "compression": "off",
      "refreservation": 25600,
      "block_size": 8192
    }
  ],

After adding "boot": true to the first disk, all was better.

If it is a requirement that one disk is marked as the boot disk, the boot should fail during a verification phase before zoneadmd tries the boot. That being said, it is not at all clear to me why the disk could not be found by uuid. There may be yet another bug here.

Comment by Michael Zeller
Created at 2018-02-14T16:47:01.617Z
Just as a counter point, here is some of `vmadm get uuid` for a working KVM instance I have that shows both disks as `boot: false`

 
{
  "zonename": "1356a735-5bbc-e723-e9f0-ca8fd8d79af4",
  "autoboot": false,
  "brand": "kvm",
  "limit_priv": "default,-file_link_any,-net_access,-proc_fork,-proc_info,-proc_session",
  "v": 1,
  "create_timestamp": "2018-01-20T06:09:52.601Z",
  "cpu_shares": 8,
  "max_lwps": 1000,
  "max_msg_ids": 4096,
  "max_sem_ids": 4096,
  "max_shm_ids": 4096,
  "max_shm_memory": 1024,
  "zfs_io_priority": 10,
  "max_physical_memory": 1280,
  "max_locked_memory": 1280,
  "max_swap": 2048,
  "cpu_cap": 200,
  "billing_id": "78838c27-2ac0-cbf7-db39-9419520429a6",
  "owner_uuid": "7c25b5f4-d0bc-4cd3-a79e-9bcd359535f1",
  "archive_on_delete": true,
  "disk_driver": "virtio",
  "nic_driver": "virtio",
  "resolvers": [
    "8.8.8.8",
    "8.8.4.4"
  ],
  "alias": "FBSD-Dev",
  "ram": 1024,
  "vcpus": 2,
  "cpu_type": "host",
  "disks": [
    {
      "path": "/dev/zvol/rdsk/zones/1356a735-5bbc-e723-e9f0-ca8fd8d79af4-disk0",
      "boot": false,
      "model": "virtio",
      "media": "disk",
      "image_size": 10240,
      "image_uuid": "d13bd654-41bb-11e7-a64c-b76d805afedf",
      "zfs_filesystem": "zones/1356a735-5bbc-e723-e9f0-ca8fd8d79af4-disk0",
      "zpool": "zones",
      "size": 10240,
      "compression": "off",
      "refreservation": 10240,
      "block_size": 8192
    },
    {
      "path": "/dev/zvol/rdsk/zones/1356a735-5bbc-e723-e9f0-ca8fd8d79af4-disk1",
      "boot": false,
      "model": "virtio",
      "media": "disk",
      "zfs_filesystem": "zones/1356a735-5bbc-e723-e9f0-ca8fd8d79af4-disk1",
      "zpool": "zones",
      "size": 25600,
      "compression": "off",
      "refreservation": 25600,
      "block_size": 8192
    }
  ],

Comment by Mike Gerdts
Created at 2018-02-14T17:26:26.651Z
I have a bhyve zone that boots properly. Then I shut it down and remove the boot=true.
[NOTICE: Zone halted]
~.
[Connection to zone '79062669-e229-e55d-960d-9b18d0fed8d0' console closed]
[root@emy-17 /root]# zonecfg -z $(vm test) info device
device:
	match: /dev/zvol/rdsk/zones/79062669-e229-e55d-960d-9b18d0fed8d0-disk0
	property: (name=boot,value="true")
	property: (name=model,value="virtio")
	property: (name=media,value="disk")
	property: (name=image-size,value="10240")
	property: (name=image-uuid,value="38396fc7-2472-416b-e61b-d833b32bd088")
[root@emy-17 /root]# zonecfg -z $(vm test)
zonecfg:79062669-e229-e55d-960d-9b18d0fed8d0> select device match=/dev/zvol/rdsk/zones/79062669-e229-e55d-960d-9b18d0fed8d0-disk0
zonecfg:79062669-e229-e55d-960d-9b18d0fed8d0:device> remove property (name=boot,value="true")
zonecfg:79062669-e229-e55d-960d-9b18d0fed8d0:device> end
zonecfg:79062669-e229-e55d-960d-9b18d0fed8d0> exit
On the next boot, I reproduce the problem described here. Once I got the (initramfs) prompt:
(initramfs) cat /proc/partitions
major minor  #blocks  name

(initramfs) ls /dev/vd*
ls: /dev/vd*: No such file or directory
Looking at $zoneroot/tmp/zhyve.log, I see the zhyve commands for the current boot (which hung) as:
zhyve -H -B 1,product=SmartDC HVM -s 1,lpc \
    -l bootrom,/usr/share/bhyve/BHYVE_UEFI_CSM.fd -l com1,/dev/zconsole \
    -l com2,socket,/tmp/vm.ttyb -c 2 -m 2048 \
    -s 2:2,virtio-blk,/dev/zvol/rdsk/zones/79062669-e229-e55d-960d-9b18d0fed8d0-disk0 \
    -s 3:0,virtio-net-viona,net0 SYSbhyve-57
The previous boot (which succeeded) was:
zhyve -H -B 1,product=SmartDC HVM -s 1,lpc \
    -l bootrom,/usr/share/bhyve/BHYVE_UEFI_CSM.fd -l com1,/dev/zconsole \
    -l com2,socket,/tmp/vm.ttyb -c 2 -m 2048 \
    -s 2:0,virtio-blk,/dev/zvol/rdsk/zones/79062669-e229-e55d-960d-9b18d0fed8d0-disk0 \
    -s 3:0,virtio-net-viona,net0 SYSbhyve-57
Notice the slot:function for disk0. In the good case, function 0 is used. In the bad case, function 2 is used. This is somewhat intentional, but with bad side effects.

In usr/src/lib/brand/bhyve/zone/boot.c, the intention is to reserve the function 0 for a bootable CD (if present) and function 1 for the boot disk. Clearly that's not what is happening and it's questionable as to whether that could even work as intended.

This logic needs to be reworked to ensure that we always have at least one disk at function 0, but perhaps in different slots.

Comment by Josh Wilsdon
Created at 2018-02-15T00:25:57.405Z
As pointed out in OS-6604, my workaround in TRITON-125 is here:

https://github.com/joyent/sdc-cloudapi/commit/1fd2a34089861c4b32c96bea7d2d8ff4a8df56cd

which ensures we always have the `boot: true` set for the root disk and avoids this problem.