OS-5176: vmadm stop and reboot may propagate improper error

Details

Issue Type:	Bug
Priority:	3 - Elevated
Status:	Resolved
Created at:	2016-02-22T15:24:20.000Z
Updated at:	2019-05-24T18:14:59.876Z

People

Created by:	Former user
Reported by:	Former user
Assigned to:	Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-05-24T18:14:50.713Z)

Fix Versions

2019-06-06 TGS (Release Date: 2019-06-06)

Related Issues

is duplicated by OS-5271 vmadm stop fails on CentOS-7 on PI 20160317
relates to IMAGE-1172 debian 9 lx instances take an extra 90 seconds during reboot and stop/shutdown
is related to OS-6950 zones can fail to terminate when issued the shutdown command
is related to OS-7800 vmadm reboot/stop miss zone going to installed state

Labels

lxbrand

Description

With CentOS 7.2 (and later) lx images, vmadm shutdown and vmadm reboot fail.

[root@buglets ~]# vmadm stop aa7a67d8-ffd0-cf86-a872-d69cec2776e5
first of 1 error: Command failed:
Session terminated, killing shell... ...killed.

vmadm exits with a non-zero exit code before the shutdown completes.

Comments

Comment by Former user
Created at 2016-03-29T21:40:00.000Z
Updated at 2017-12-14T17:24:23.113Z

This is not a bug in the lx emulation. Here is the behavior from a trace (with extraneous lines removed). The su command forks/execs the shutdown command and waits for it to exit. However, while it is waiting, systemd sends a SIGTERM to su because of the shutdown. That interrupts the wait4 syscall. This causes the code to exit 1 instead of 0.

su 164515 1 -> clone(0x1200011, 0x0 0x0, 0x7fffff010a10 0x7fffff010740)
...
su 164515 1 -> wait4(164519, 0x7fffffebf9f0, 0x2, 0x0)
...
systemd 163792 1 send signal: 15 to: su pid 164515, tid 1
systemd 163792 1 <- kill(164515, 15)=0
su 164515 1 <- wait4=-4
su 164515 1 handle signal: 15
su 164515 1 <- write(2 \nSession terminated, killing shell...\0 37)=37
...
su 164515 1 -> nanosleep: 2:0
su 164515 1 <- nanosleep=0
su 164515 1 <- kill(164519, 9)=0
su 164515 1 <- write(2  ...killed.\n\024\033\0 12)=12
su 164515 1 <- close(1)=0
su 164515 1 <- close(2)=0
su 164515 -> rexit(1)

The source for this is in the Centos 7.2 util-linux-2.23.2 package in util-linux-2.23.2/login-utils/su-common.c in the create_watching_parent function. This is the code block where it is waiting for the child.

  if (!caught_signal)
    {
      pid_t pid;
      for (;;)
        {
          pid = waitpid (child, &status, WUNTRACED);

          if (pid != (pid_t)-1 && WIFSTOPPED (status))
            {
              kill (getpid (), SIGSTOP);
              /* once we get here, we must have resumed */
              kill (pid, SIGCONT);
            }
          else
            break;
        }
      if (pid != (pid_t)-1)
        if (WIFSIGNALED (status))
          {
            status = WTERMSIG (status) + 128;
            if (WCOREDUMP (status))
              fprintf (stderr, _("%s (core dumped)\n"),
                 strsignal (WTERMSIG (status)));
          }
        else
          status = WEXITSTATUS (status);
      else
        status = 1;                       <----- here is where we set status to 1 because of the interrupted syscall
    }
  else
    status = 1;

  if (caught_signal)
    {
      fprintf (stderr, _("\nSession terminated, killing shell..."));
      kill (child, SIGTERM);
    }

  cleanup_pam (PAM_SUCCESS);

  if (caught_signal)
    {
      sleep (2);
      kill (child, SIGKILL);
      fprintf (stderr, _(" ...killed.\n"));
    }
  exit (status);

We can see that since the waitpid call was interrupted that the code sets status to 1. We can see how it then does the sleep, kill, etc. before exiting.

This is simply how su is implemented here. If there is a bug, which is debatable, it is in vmadm assuming that the shutdown its running cannot return a non-zero result. However, in this case that is exactly what su will do because of the interrupted syscall.

Comment by Josh Wilsdon
Created at 2017-08-22T17:31:57.000Z
Updated at 2018-05-07T17:50:59.055Z

I wonder whether there's some appetite for changing to using a stop and reboot that don't call shutdown. That we could have stronger confidence in working. Those would also be faster since we'd not need to wait for something that might not happen. Alternatively, I guess we could ignore exit status from shutdown. In that case, we'd eventually time out and halt the zone. It'd just take a few minutes every time.

Comment by Former user
Created at 2017-08-22T18:40:08.000Z

I'd prefer the latter (ignore exit status and time out) as it is closer to what users would expect. They still have the option to stop the instance before creating an image out of it.

Comment by Former user
Created at 2017-08-22T18:43:22.000Z
Updated at 2017-12-14T17:24:34.375Z

Perhaps we can use a shorter timeout if we get a non-zero exit or perhaps adjust this by brand?

Comment by Josh Wilsdon
Created at 2017-08-22T21:18:32.000Z
Updated at 2018-05-07T17:50:59.541Z

@accountid:62561aa213c2c8006a35f07b: that should be doable. Good suggestion!

Comment by Former user
Created at 2019-05-14T19:21:16.399Z
Updated at 2019-05-14T19:32:02.090Z

Interestingly, zoneadm shutdown and zoneadm reboot (introduced a couple years after the relevant vmadm shutdown/reboot code) do not have this problem. It would probably be best to lean on the features that are core to illumos rather than replicating them in vmadm.

Comment by Former user
Created at 2019-05-15T22:41:05.444Z

I initially went down the path of using zoneadm reboot and zoneadm shutdown but came to realize that they weren't really great fits.

zoneadm reboot is the equivalent of zoneadm halt followed by zoneadm boot. That is, it does not perform a graceful reboot.

zoneadm shutdown calls /sbin/init 0, which works great usually. It does not work, however, with Alpine 3 lx zones. In Alpine, /sbin/init is busybox. It insists on only running if its pid is 1. Adding a shutdown brand hook to invoke /sbin/shutdown -h now triggered some race that caused the zone to get stuck in shutting_down. The zone would empty, but zone_state would be stuck at ZONE_IS_SHUTTING_DOWN, never making it to ZONE_EMPTY. After the shutdown timeout came, zoneadmd would give up and exit, leaving the zone in shutting_down. A subsequent zoneadm halt could clear it.

That leads us back to an earlier proposal: don't treat shutdown command errors as fatal. Rather, ignore errors during shutdown and wait for the zone state to reach installed before considering the stop as successful.

Comment by Former user
Created at 2019-05-16T16:06:16.547Z
Updated at 2019-05-16T21:31:33.665Z

Testing

A wide variety of images across all brands were tested. See related notes in OS-6950, OS-7800, and IMAGE-1172. The test-reboot-shutdown script was used for testing with a variety of images. Baseline results are in test-20190424T233834Z.out. Results with the fix are in test-fix.out.
parse_output can be used to print the results that PASS in a nice table.

The results (showing elapsed time) are summarized below.

Distro	Version	Brand	baseline reboot	baseline `stop`	baseline `stop -f`	fixed `reboot`	fixed `stop`	fixed `stop -f`
ubuntu-14.04	20150724	lx	3	10	2	3	10	2
ubuntu-16.04	20170403	lx	13	15	2	12	15	1
centos-6	20170407	lx	5	11	2	5	10	1
centos-7	20180323	lx	FAIL	\|	15	20	1
debian-8	20170214	lx	13	21	2	13	14	1
debian-9	20180404	lx	92	95	2	93	64	1
alpine-3	20170303	lx	6	11	2	6	10	1
base-64	19.1.0	joyent	6	11	2	6	10	1
base-64-lts	18.4.0	joyent	5	12	2	6	11	1
base-64-lts	17.4.0	joyent	6	12	2	6	11	1
ubuntu-certified-14.04	20180222	kvm	183	180	1	182	181	2
freebsd-11	20180213	kvm	182	181	2	182	181	2
centos-6	20170621	kvm	182	180	1	182	180	2
ubuntu-certified-16.04	20181005	kvm	183	180	1	184	181	2
ubuntu-certified-18.04	20190212.1	kvm	182	180	2	182	181	2
debian-8	20180819	kvm	182	180	1	182	180	2
centos-7	20181003	kvm	6	4	2	182	180	2
ubuntu-certified-16.04	20181005	bhyve	6	3	2	8	3	2
ubuntu-certified-18.04	20190212.1	bhyve	5	3	1	5	3	1
debian-8	20180819	bhyve	4	2	2	5	2	1
centos-7	20181003	bhyve	6	4	2	7	4	2

Comment by Former user
Created at 2019-05-16T19:56:21.772Z

The "halt after timeout" was verified to work as well. This is from vmadm stop of a debian 9 lx instance that suffers from IMAGE-1172.

[2019-05-16T19:41:25.499Z] DEBUG: vmadm/20419 on buglets: vminfod watchForEvent created (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=waitForZoneState, vm=7b5
0faf6-cc1e-6bfc-9657-842a3517b2d0)
    client: VM.js waitForZoneState (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
    --
    obj: {
      "uuid": "7b50faf6-cc1e-6bfc-9657-842a3517b2d0",
      "vm": {
        "zone_state": "installed"
      }
    }
    --
    opts: {
      "timeout": 60000,
      "catchErrors": true,
      "teardown": true
    }
[2019-05-16T19:42:25.501Z] ERROR: vmadm/20419 on buglets: vminfod watchForEvent "VM.js waitForZoneState (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)" timeout exceeded - unmatched events (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=waitForZoneState, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, events=[])
    client: VM.js waitForZoneState (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
    --
    obj: {
      "uuid": "7b50faf6-cc1e-6bfc-9657-842a3517b2d0",
      "vm": {
        "zone_state": "installed"
      }
    }
[2019-05-16T19:42:25.502Z] DEBUG: vmadm/20419 on buglets: vminfod event stream stopped (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=waitForZoneState, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
    client: VM.js waitForZoneState (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
[2019-05-16T19:42:25.503Z]  INFO: vmadm/20419 on buglets: Timeout waiting for shutdown to complete; halting. (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
[2019-05-16T19:42:25.503Z] DEBUG: vmadm/20419 on buglets: execFile (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, cmd=/usr/sbin/zoneadm, opts={})
    args: [
      "-z",
      "7b50faf6-cc1e-6bfc-9657-842a3517b2d0",
      "halt"
    ]
[2019-05-16T19:42:25.580Z] DEBUG: vmadm/20419 on buglets: vminfod watchForChanges 3/3 changes remaining (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, remaining=3, total=3)
    client: VM.js VM.stop (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
[2019-05-16T19:42:26.587Z] DEBUG: vmadm/20419 on buglets: vminfod watchForChanges 3/3 changes remaining (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, remaining=3, total=3)
    client: VM.js VM.stop (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
[2019-05-16T19:42:26.616Z] DEBUG: vmadm/20419 on buglets: vminfod watchForChanges 2/3 changes remaining (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, remaining=2, total=3)
    client: VM.js VM.stop (7b50faf6-cc1e-6bfc-9657-842a3517b2d0)
[2019-05-16T19:42:26.617Z] DEBUG: vmadm/20419 on buglets: halted zone (req_id=1afd7828-d21b-4e97-ef34-d49c6375ae8e, action=stop, vm=7b50faf6-cc1e-6bfc-9657-842a3517b2d0, stdout="")
    stderr: zone '7b50faf6-cc1e-6bfc-9657-842a3517b2d0': /zones/7b50faf6-cc1e-6bfc-9657-842a3517b2d0/root/checkpoints: No such file or directory

Comment by Jira Bot
Created at 2019-05-24T18:14:59.876Z

smartos-live commit 009dfe433c8ba38e5df14abb62c25a224f10ba33 (branch master, by Mike Gerdts)

OS-5176 vmadm stop and reboot may propagate improper error
Reviewed by: Pedro Palazón Candel <pedro@joyent.com>
Approved by: Pedro Palazón Candel <pedro@joyent.com>