|Priority:||4 - Normal|
|Created by:||Brian Bennett|
|Reported by:||Brian Bennett|
|Assigned to:||Brian Bennett|
We've run into a situation with Linux where after server setup the CN will reboot back into SmartOS with an unreadable zpool.
Here's what's going on:
At this point you can manually PUT the nic with correct ownership in NAPI and reboot and it will get the correct Linux PI. I've also confirmed that net-agent only needs about 10s to do the nic adoption. For now we're working around this by putting in a sleep. But I believe this condition also exists for SmartOS, it's just never been noticed.
So why isn't it a problem for SmartOS? Because prior to the introduction of Linux CNs, servers are almost exclusively set up on the default PI. If net-agent doesn't finish, it'll just boot the default PI that it was supposed to boot anyway, the zpool isn't unreadable so the system boots as expected and net-agent starts up and takes ownership of the nics. The next time the server is rebooted, whenever that is, it'll get assigned PI. Worst case scenario, it boots the default PI, a different PI is assigned, server setup runs and it boots the default PI again and net-agent does its thing. Operators are likely to think "huh, that was weird" and just reboot it again, never looking closely at it. In a SmartOS only world, it's exceedingly rare (almost, but not quite impossible*) that you can end up in a situation with a server that's been through setup, and not capable of booting properly.
On top of this, we do several more things after
agentsetup.sh on SmartOS that we don't do on Linux and I believe that gives SmartOS net-agent the necessary time it needs. However, this problem may very well exist in the reverse if your default PI is Linux and you want the occasional SmartOS CN.
Either way, in order for the server to deterministically boot the intended platform is to have a workflow step to ensure that the nics are properly owned in napi and stop leaving it up to chance.
* It's actually the same trigger condition. Linux CN's zfs supports options that SmartOS zfs doesn't. If you have two versions of SmartOS where the version it was running at the time of setup has features that the default doesn't support, and net-agent doesn't have time to adopt the nics, you'll boot the default and not be able to import the zpool.