OS-2680: use of ifconfig by network services breaks NTP

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2013-12-27T23:45:12.000Z)

Fix Versions

2014-01-09 Fruit F'er (Release Date: 2014-01-09)

Description

During the installation of SDC the HN reboots after creating the ZPOOL. On restart the NTP service goes into maintenance every time (multiple installation attempts) and the setup of core services fails in various places.

The workaround is to immediately login and run svcadm clear ntp. NTP will restart ok.

Looking at the logs we can see that networking is ready before NTP tries to connect but it fails nonetheless.

/var/svc/log/network-ntp:default.log shows the initial failure then a successful restart a few seconds later.

[ Nov 18 10:53:32 Executing start method ("/lib/svc/method/ntp start"). ]
Error: Servers/peers specified but none is reachable.
[ Nov 18 10:53:32 Method "start" exited with status 96. ]
[ Nov 18 10:53:47 Leaving maintenance because clear requested. ]
[ Nov 18 10:53:47 Enabled. ]
[ Nov 18 10:53:47 Executing start method ("/lib/svc/method/ntp start"). ]
[ Nov 18 10:53:47 Method "start" exited with status 0. ]

/var/svc/log/network-physical:default.log shows networking is established 7 seconds before NTP tries to connect

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
bnx0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 10.0.1.7 netmask ffffff00 broadcast 10.0.1.255
        ether 78:2b:cb:a:79:a 
external0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet V.A.L.ID netmask ffffffc0 broadcast V.A.L.ID
        ether 2:8:20:64:b8:3d 
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128 
== debug end: after ==
+ smf_is_globalzone
+ [ global = global ]
+ return 0
+ /usr/bin/sysinfo -u
+ /usr/sbin/ndd -set /dev/ip ip_strict_src_multihoming 1
+ [[ -n '' ]]
+ /sbin/ifconfig -a4u
+ [[ -n %%CONTAINER%%#39;lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1\n\tinet 127.0.0.1 netmask ff000000 \nbnx0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv
+ /sbin/ifconfig -a4u
+ read intf addr rest
+ [[ lo0: == inet ]]
+ read intf addr rest
+ [[ inet == inet ]]
+ [[ 127.0.0.1 != 127.0.0.1 ]]
+ read intf addr rest
+ [[ bnx0: == inet ]]
+ read intf addr rest
+ [[ inet == inet ]]
+ [[ 10.0.1.7 != 127.0.0.1 ]]
+ [[ 10.0.1.7 != 0.0.0.0 ]]
+ exit 0
[ Nov 18 10:53:25 Method "start" exited with status 0. ]

Comments

Comment by Peter Gale [X]
Created at 2013-12-12T16:13:48.000Z
Updated at 2017-12-14T17:34:39.125Z
This bug is a complete pain. If you get distracted and fail to clear NTP the install fails and a factoryreset is needed.
I cant justify changing the Priority but it would be great to get this fixed.

Comment by Keith Wesolowski [X]
Created at 2013-12-27T06:07:30.000Z
As expected, the problem here is that although we've depended upon every network service and those services have started successfully, the network isn't actually up yet when we go to run ntpdate. The width of the window here is shockingly small -- about 245ms in my testbed -- which explains why more intrusive attempts to observe this behaviour invariably eliminate it.

The proof comes to us courtesy of anonymous tracing, specifically the following simple D script:

fbt::rts_new_rtsmsg:entry
{
        printf("rtmsg time=%d cmd=%d err=%d ipif=%p flags=%d\n",
            timestamp, args[0], args[1], args[2], args[3]);
}

proc:::exec-success
/execname == "ntpdate"/
{
        printf("ntpdate started time=%d\n", timestamp);
}

This examines routing socket messages, created by various IP code when topology changes occur, including adding and removal of interfaces and routes. The output is fairly revealing:

  3  34426             rts_new_rtsmsg:entry rtmsg time=205491155043511 cmd=12 err=0 ipif=ffffff23eeec28b0 flags=3
  3  34426             rts_new_rtsmsg:entry rtmsg time=205491155052211 cmd=1 err=0 ipif=ffffff23eeec28b0 flags=3
  1  34426             rts_new_rtsmsg:entry rtmsg time=205491158015876 cmd=12 err=0 ipif=ffffff23ee846440 flags=3
  1  34426             rts_new_rtsmsg:entry rtmsg time=205491158019354 cmd=1 err=0 ipif=ffffff23ee846440 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494777666304 cmd=2 err=0 ipif=ffffff23ebeb3828 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494777670846 cmd=13 err=0 ipif=ffffff23ebeb3828 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494777675348 cmd=15 err=0 ipif=ffffff23ebeb3828 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494777717519 cmd=12 err=0 ipif=ffffff23ebeb3828 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494777727399 cmd=1 err=0 ipif=ffffff23ebeb3828 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778398299 cmd=2 err=0 ipif=ffffff23eeec28b0 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778401312 cmd=13 err=0 ipif=ffffff23eeec28b0 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778404523 cmd=15 err=0 ipif=ffffff23eeec28b0 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778965956 cmd=2 err=0 ipif=ffffff23ee846440 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778968745 cmd=13 err=0 ipif=ffffff23ee846440 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=205494778971331 cmd=15 err=0 ipif=ffffff23ee846440 flags=3
 11  31978         exec_common:exec-success ntpdate started time=205495123950406
 23  34426             rts_new_rtsmsg:entry rtmsg time=205495252025779 cmd=12 err=0 ipif=ffffff23ee846440 flags=3
 23  34426             rts_new_rtsmsg:entry rtmsg time=205495252037613 cmd=1 err=0 ipif=ffffff23ee846440 flags=3
  3  34426             rts_new_rtsmsg:entry rtmsg time=205495368032351 cmd=12 err=0 ipif=ffffff23eeec28b0 flags=3
  3  34426             rts_new_rtsmsg:entry rtmsg time=205495368038123 cmd=1 err=0 ipif=ffffff23eeec28b0 flags=3
 29  34426             rts_new_rtsmsg:entry rtmsg time=205507307244758 cmd=15 err=0 ipif=ffffff24310016d8 flags=3

Routing messages of type 1 and 12 are associated with the addition of routes and addresses, respectively (see route(7p)), so the 12-1 sequence is what we expect when an interface is brought up: the address is added, and then a route is created for the new on-link network. We see several of these sequences prior to ntpdate's execution, but critically we also see 2 more such sequences after it. Note that the next message after those is 12 seconds later (and, by inspecting the associated ipif_t, we can see that they are associated with interfaces in a netstack other than the one used by the GZ).

The last GZ RTM_ADD here occurs at 205495.368038123, or about 244ms after ntpdate was executed at 205495.123950406. This race window is sufficiently narrow that even a more intrusive enabling (e.g., capturing fbt:::return from within the connect(3socket) syscall made by ntpdate, which fails and dooms the service) is sufficient to delay ntpdate's execution until the new addresses and routes are added. Obviously, the size of this race window will vary from system to system, and it's possible that some systems may not exhibit the behaviour at all.

We might also be curious where these routing messages are originating from; after all, if some service is triggering them, we probably need to depend on it. Unfortunately, they're completely asynchronous:

 16  34426             rts_new_rtsmsg:entry rtmsg time=206995482036538 cmd=12 err=0 ipif=ffffff23f0ea6020 flags=3
pid=0 name=sched

              ip`ip_rts_newaddrmsg+0xe2
              ip`ipif_up_notify+0x31
              ip`nce_timer+0x84d
              genunix`callout_list_expire+0x98
              genunix`callout_expire+0x3b
              genunix`callout_execute+0x20
              genunix`taskq_thread+0x2d0
              unix`thread_start+0x8

 16  34426             rts_new_rtsmsg:entry rtmsg time=206995482046354 cmd=1 err=0 ipif=ffffff23f0ea6020 flags=3
pid=0 name=sched

              ip`ip_rts_newaddrmsg+0xf4
              ip`ipif_up_notify+0x31
              ip`nce_timer+0x84d
              genunix`callout_list_expire+0x98
              genunix`callout_expire+0x3b
              genunix`callout_execute+0x20
              genunix`taskq_thread+0x2d0
              unix`thread_start+0x8

It seems the two salient questions here are (1) is there some particular reason that a considerable length of time elapses between network configuration and the completion of that activity, and (2) is it reasonable to block the completion of one or more of the network services until it occurs (ideally by means of a -s option for ipadm, dladm, etc., or failing that by a separate program that waits for pending configuration change requests to complete)? I recall that ak_net.c did a considerable amount of work based on these routing messages, but the need for synchronisation seems to be new, and unanticipated by the system's designers.

Comment by Keith Wesolowski [X]
Created at 2013-12-27T17:08:20.000Z
This delay is part of the duplicate address detection logic. See RFC 2462 and its successor 4861. This behaviour is valuable and will be absolutely necessary for IPv6, so disabling or removing it is really not an option.

Comment by Keith Wesolowski [X]
Created at 2013-12-27T22:58:34.000Z
This appears to be a behavioral difference between ifconfig(1m) and ipadm(1m). libipadm contains a function, i_ipadm_dad_wait(), that is used in the path where we assign an address to an interface as via create-addr. This does more or less what we want; it waits for routing messages to indicate that the address has been added. Unfortunately its failure mode is that it times out, and it doesn't bother to wait for the RTM_ADD, but from a practical perspective this is still pretty good. ifconfig, however, has no equivalent. As such, we see the following sequences of events:

Good (ipadm, note that we do not exit until the 12-1 pair of messages):

[root@headnode (sf-4) ~]# ipadm create-addr -t -T static -a 172.20.4.99/16 external0/v4
  8  34426             rts_new_rtsmsg:entry rtmsg time=265473849346674 cmd=15 err=0 ipif=ffffff24494675c8 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=265473849629781 cmd=2 err=0 ipif=ffffff24494675c8 flags=3
  8  34426             rts_new_rtsmsg:entry rtmsg time=265473849650372 cmd=13 err=0 ipif=ffffff24494675c8 flags=3
  0  34426             rts_new_rtsmsg:entry rtmsg time=265474408033224 cmd=12 err=0 ipif=ffffff24494675c8 flags=3
  0  34426             rts_new_rtsmsg:entry rtmsg time=265474408073427 cmd=1 err=0 ipif=ffffff24494675c8 flags=3
  8  31975                   proc_exit:exit ipadm exit time=265474409080532

Bad (ifconfig, no such synchronisation):

[root@headnode (sf-4) ~]# ifconfig external0:2 172.20.4.102 netmask 255.255.0.0 up
[root@headnode (sf-4) ~]#  
 16  34426             rts_new_rtsmsg:entry rtmsg time=265566095024691 cmd=15 err=0 ipif=ffffff2446f12718 flags=3
 16  34426             rts_new_rtsmsg:entry rtmsg time=265566096092127 cmd=2 err=0 ipif=ffffff2446f12718 flags=3
 16  34426             rts_new_rtsmsg:entry rtmsg time=265566096113668 cmd=13 err=0 ipif=ffffff2446f12718 flags=3
 16  31975                   proc_exit:exit ifconfig exit time=265566096378848
 21  34426             rts_new_rtsmsg:entry rtmsg time=265566685027939 cmd=12 err=0 ipif=ffffff2446f12718 flags=3
 21  34426             rts_new_rtsmsg:entry rtmsg time=265566685057212 cmd=1 err=0 ipif=ffffff2446f12718 flags=3

We see the last two messages -- the ones indicating that we're actually done -- about 590ms after ifconfig has exited. So any SMF service that brings up network interfaces using ifconfig cannot be used as a meaningful dependency by other services expecting those interfaces to be up.

Putting it all together, we find that the specific sequence of events looks like this:

 31  31975                   proc_exit:exit ifconfig /sbin/ifconfig external0 inet 172.20.4.0 netmask 255.255.0.0 up exiting time=266383684422015
 31  31978         exec_common:exec-success ifconfig /sbin/ifconfig -a started time=266383822145014
 31  31975                   proc_exit:exit ifconfig /sbin/ifconfig -a exiting time=266383824667636
  2  34426             rts_new_rtsmsg:entry rtmsg time=266384234031699 cmd=12 err=0 ipif=ffffff23f1251b90 flags=3
  2  34426             rts_new_rtsmsg:entry rtmsg time=266384234040789 cmd=1 err=0 ipif=ffffff23f1251b90 flags=3
  1  34426             rts_new_rtsmsg:entry rtmsg time=266384305028128 cmd=12 err=0 ipif=ffffff23ee67fdc8 flags=3
  1  34426             rts_new_rtsmsg:entry rtmsg time=266384305042185 cmd=1 err=0 ipif=ffffff23ee67fdc8 flags=3
…

  8  31978         exec_common:exec-success ifconfig ifconfig external0 started time=266396043257442
  8  31975                   proc_exit:exit ifconfig ifconfig external0 exiting time=266396045153226
 30  31978         exec_common:exec-success ifconfig /usr/sbin/ifconfig -auD4 netmask + broadcast + started time=266396675640376
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396680268494 cmd=2 err=0 ipif=ffffff23ebfae3e8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396680282899 cmd=13 err=0 ipif=ffffff23ebfae3e8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396680304481 cmd=15 err=0 ipif=ffffff23ebfae3e8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396680420990 cmd=12 err=0 ipif=ffffff23ebfae3e8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396680426517 cmd=1 err=0 ipif=ffffff23ebfae3e8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396681748175 cmd=2 err=0 ipif=ffffff23f1251b90 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396681754242 cmd=13 err=0 ipif=ffffff23f1251b90 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396681759629 cmd=15 err=0 ipif=ffffff23f1251b90 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396682915457 cmd=2 err=0 ipif=ffffff23ee67fdc8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396682936018 cmd=13 err=0 ipif=ffffff23ee67fdc8 flags=3
 30  34426             rts_new_rtsmsg:entry rtmsg time=266396682942852 cmd=15 err=0 ipif=ffffff23ee67fdc8 flags=3
 30  31975                   proc_exit:exit ifconfig /usr/sbin/ifconfig -auD4 netmask + broadcast + exiting time=266396683138162
  9  31978         exec_common:exec-success ntpdate started time=266397046644147
  4  34426             rts_new_rtsmsg:entry rtmsg time=266397128048668 cmd=12 err=0 ipif=ffffff23ee67fdc8 flags=3
  4  34426             rts_new_rtsmsg:entry rtmsg time=266397128059992 cmd=1 err=0 ipif=ffffff23ee67fdc8 flags=3
 31  34426             rts_new_rtsmsg:entry rtmsg time=266397279056183 cmd=12 err=0 ipif=ffffff23f1251b90 flags=3
 31  34426             rts_new_rtsmsg:entry rtmsg time=266397279068099 cmd=1 err=0 ipif=ffffff23f1251b90 flags=3

Note that the last time we actually did much of anything to external0 was around 13 seconds before the rest of this. The likely culprit, therefore, is svc:/network/netmask, which has as its sole function the execution of 'ifconfig -auD4 netmask + broadcast +', intended to pick up netmasks from NIS of all places. We can confirm that this does indeed result in a flood of routing messages, and that the last of these messages arrive over 500ms after the command completes:

[root@headnode (sf-4) ~]# ifconfig ifconfig -auD4 netmask + broadcast + started time=267161414506615
rtmsg time=267161420362377 cmd=2 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267161420374260 cmd=13 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267161420388860 cmd=15 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267161420505829 cmd=12 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267161420512510 cmd=1 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267161422090883 cmd=2 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267161422101826 cmd=13 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267161422109500 cmd=15 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267161423478512 cmd=2 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267161423490040 cmd=13 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267161423497283 cmd=15 err=0 ipif=ffffff23ee67fdc8 flags=3
ifconfig ifconfig -auD4 netmask + broadcast + exiting time=267161423733301
rtmsg time=267161996042031 cmd=12 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267161996054957 cmd=1 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267162060061994 cmd=12 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267162060082835 cmd=1 err=0 ipif=ffffff23ee67fdc8 flags=3

And, proving it, we can even run the very same ntpdate command we run in /lib/svc/method/ntp immediately after ifconfig terminates, and see that it does indeed fail:

[root@headnode (sf-4) ~]# ifconfig -auD4 netmask + broadcast +; ntpdate -q -u -p 1 -t 5 -- 132.246.11.228
27 Dec 22:47:37 ntpdate[18129]: no servers can be used, exiting
[root@headnode (sf-4) ~]# ifconfig ifconfig -auD4 netmask + broadcast + started time=267311931782978
rtmsg time=267311938071862 cmd=2 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267311938083938 cmd=13 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267311938096198 cmd=15 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267311938211740 cmd=12 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267311938218064 cmd=1 err=0 ipif=ffffff23ebfae3e8 flags=3
rtmsg time=267311939794568 cmd=2 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267311939802051 cmd=13 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267311939810401 cmd=15 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267311941204885 cmd=2 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267311941213385 cmd=13 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267311941220248 cmd=15 err=0 ipif=ffffff23ee67fdc8 flags=3
ifconfig ifconfig -auD4 netmask + broadcast + exiting time=267311941450132
ntpdate started time=267311943639139
rtmsg time=267312407040495 cmd=12 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267312407052165 cmd=1 err=0 ipif=ffffff23f1251b90 flags=3
rtmsg time=267312474043211 cmd=12 err=0 ipif=ffffff23ee67fdc8 flags=3
rtmsg time=267312474051398 cmd=1 err=0 ipif=ffffff23ee67fdc8 flags=3

Thus, the use of ifconfig here by /lib/svc/method/net-netmask is condemned. This entire service is useless, however, at least in the GZ and almost certainly everywhere else; SDC doesn't support NIS and never will. Therefore the simple, low-risk change here is to turn it off. That makes this an OS bug.

Comment by Bot Bot [X]
Created at 2013-12-27T23:44:43.000Z

smartos-live commit 020fe6e (branch master, by Keith M Wesolowski)

OS-2680 use of ifconfig by network services breaks NTP