OS-1457: dladm won't show or create vnics

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2012-08-08T16:22:39.000Z
Updated at:2019-09-04T01:48:59.035Z

People

Created by:Robert Gulewich [X]
Reported by:Robert Gulewich [X]
Assigned to:Ryan Zezeski [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-01-31T16:00:00.000Z)

Fix Versions

2017-02-02 ICELAND ESCALATION (Release Date: 2017-02-02)

Description

On bh1-build2, dladm is pretty busted. Creating new vnics fails, and doing a 'dladm show-vnic' doesn't show any vnics (even though there are definitely many vnics on the system). This is in the Global Zone.

Digging into the show-vnic problem a bit, dladm is calling ioctl(dladm_dld_fd(handle), VNIC_IOC_INFO, &ioc), which is returning DLADM_STATUS_NOTFOUND. It appears as though i_dladm_vnic_info_active is being called with an invalid datalink_id_t:

[root@bh1-build2 (bh1) ~]# dtrace -Z -n 'pid$target::i_dladm_vnic_info_active:entry { print((datalink_id_t)arg1) }' -c 'dladm show-vnic'
dtrace: description 'pid$target::i_dladm_vnic_info_active:entry ' matched 0 probes
LINK         OVER       SPEED MACADDRESS        MACADDRTYPE VID  ZONE
net0         igb2       0     c0:ff:ee:8e:b5:24 fixed       0    5ee5a4c9-7180-4527-bb49-b1f34674183a
dtrace: pid 17377 has exited
CPU     ID                    FUNCTION:NAME
 13  72970   i_dladm_vnic_info_active:entry datalink_id_t 0x176

 13  72970   i_dladm_vnic_info_active:entry datalink_id_t 0x8041020

(0x176 is a valid link ID that exists in dlmgmtd, but 0x8041020 is not)

Comments

Comment by Robert Gulewich [X]
Created at 2012-08-08T16:29:51.000Z

The GZ's dlmgmtd only appears to know about some of the zones' vnics:

[root@bh1-build2 (bh1) ~]# mdb -p $(pgrep dlmgmtd)Loading modules: [ ld.so.1 libc.so.1 libavl.so.1 ]
> dlmgmt_id_avl::walk avl | ::print dlmgmt_link_t ll_linkid ll_link
ll_linkid = 0x1
ll_link = [ "igb2" ]
ll_linkid = 0x2
ll_link = [ "igb0" ]
ll_linkid = 0x3
ll_link = [ "igb3" ]
ll_linkid = 0x4
ll_link = [ "igb1" ]
ll_linkid = 0x176
ll_link = [ "net0" ]
ll_linkid = 0x177
ll_link = [ "net0" ]
> dlmgmt_name_avl::walk avl | ::print dlmgmt_link_t ll_linkid ll_link
ll_linkid = 0x2ll_link = [ "igb0" ]
ll_linkid = 0x4
ll_link = [ "igb1" ]
ll_linkid = 0x1
ll_link = [ "igb2" ]
ll_linkid = 0x3ll_link = [ "igb3" ]
ll_linkid = 0x177
ll_link = [ "net0" ]
ll_linkid = 0x176ll_link = [ "net0" ]

Comment by Robert Gulewich [X]
Created at 2012-08-08T16:59:35.000Z

The kernel's vnic_cache has a bunch of vnics in it:

> ffffff0d370dd5c0::modhash -e | ::print vnic_t vn_id
vn_id = 0x10
vn_id = 0x19a
vn_id = 0x11
vn_id = 0x16c
vn_id = 0x12
vn_id = 0x10f
vn_id = 0x13
vn_id = 0x14
vn_id = 0x83
vn_id = 0x15
vn_id = 0x45
vn_id = 0x6
vn_id = 0x16
vn_id = 0x26
vn_id = 0xc4
vn_id = 0x7
vn_id = 0x17
vn_id = 0x8
vn_id = 0x18
vn_id = 0x9
vn_id = 0x19
vn_id = 0xa
vn_id = 0x1a
vn_id = 0xb
vn_id = 0x1b
vn_id = 0x176
vn_id = 0xc
vn_id = 0x1c
vn_id = 0xd
vn_id = 0x158
vn_id = 0xe
vn_id = 0xcc

Comment by Robert Gulewich [X]
Created at 2012-08-13T16:41:50.000Z

The commands that cause dlmgmtd to bail:

dladm create-vnic  -t -l ixgbe0 -p zone=96a98bc3-f596-439a-a54a-db5b7d73b4ee -m 12:22:33:01:01:01 tmp687240
dladm create-vnic  -t -l ixgbe0 -p zone=96a98bc3-f596-439a-a54a-db5b7d73b4ee -m 12:22:32:01:01:01 tmp687240

Comment by Ryan Zezeski [X]
Created at 2016-12-29T17:13:22.000Z

The issue here is that the various datalink lists become out of sync
with when a create fails. The creation tries to undo previous work by
deleting the link but this goes bad if the new link name matches an
existing one.

There are three places that datalink information is kept:

Also there is the vnic_cache, specific to vnics, which I'll list
here as well since it was included in the original report. The issue
is that these various sources become inconsistent when a VNIC creation
for an existing name fails. I'm running on a modified platform which
includes my fixes for OS-3506 and OS-5363. The patches for these bugs
change how transient vnics are deleted (from zoneadmd instead of
dlmgmtd) but this issue still exists in previous platforms as the
original report is proof of.

Here are the steps to reproduce:

1. Start smartos zone.
2. Create new temp vnic and assign to zone.
3. List links from perspective of dladm, DLS, vnic_cache, dlmgmtd, and
zone list.
4. Try to create same vnic in step 2 but change mac addr.
5. Repeat step 3.
6. Try to stop smartos zone.

[root@testsos ~]# vmadm start e0b64261-4708-c1f3-971b-dd561161b3dc
Successfully started VM e0b64261-4708-c1f3-971b-dd561161b3dc
[root@testsos ~]# dladm create-vnic  -t -l e1000g0 -p zone=e0b64261-4708-c1f3-971b-dd561161b3dc -m 12:22:33:01:01:01 tmp687240

[root@testsos ~]# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
net0        vnic      1500   ?        --         e1000g0
tmp687240   vnic      1500   up       --         e1000g0
[root@testsos ~]# echo "::walk dls_devnet_cache | ::print dls_devnet_t dd_linkid dd_linkname" | mdb -k
dd_linkid = 0x3
dd_linkname = [ "tmp687240" ]
dd_linkid = 0x2
dd_linkname = [ "net0" ]
dd_linkid = 0x1
dd_linkname = [ "e1000g0" ]
[root@testsos ~]# echo "::modhash -t ! grep vnic" | mdb -k
ffffff01a009bc00     40     2               id vnic_hash
[root@testsos ~]# echo "ffffff01a009bc00::modhash -e | ::print vnic_t vn_id" | mdb -k
vn_id = 0x2
vn_id = 0x3
[root@testsos ~]# echo "dlmgmt_id_avl::walk avl | ::print dlmgmt_link_t ll_linkid ll_link" | mdb -p $(pgrep dlmgmtd)
ll_linkid = 0x1
ll_link = [ "e1000g0" ]
ll_linkid = 0x2
ll_link = [ "net0" ]
ll_linkid = 0x3
ll_link = [ "tmp687240" ]
[root@testsos ~]# echo "::walk zone | ::print -a zone_t zone_name" | mdb -k
fffffffffc00fae0 zone_name = 0xfffffffffbf36579 "global"
ffffff01b053d740 zone_name = 0xffffff01daa59200 "
e0b64261-4708-c1f3-971b-dd561161b3dc"
[root@testsos ~]# echo "ffffff01b053d740::print zone_t zone_dl_list | ::walk list | ::print zone_dl_t zdl_id" | mdb -k
zdl_id = 0x3
zdl_id = 0x2
[root@testsos ~]# dladm create-vnic  -t -l e1000g0 -p zone=e0b64261-4708-c1f3-971b-dd561161b3dc -m 12:22:32:01:01:01 tmp687240
dladm: warning: failed to set property zone: operation failed
dladm: vnic creation over e1000g0 failed: operation failed

[root@testsos ~]# tail -f /var/adm/messages
2016-12-28T22:26:32.844371+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('unix', 0, 'tmp687240'): namespace collision
2016-12-28T22:26:32.844440+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('tmp687240', 0, 'mac_misc_stat'): namespace collision
2016-12-28T22:26:32.844445+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('tmp687240', 0, 'mac_rx_swlane0'): namespace collision
2016-12-28T22:26:32.844448+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('tmp687240', 0, 'mac_tx_swlane0'): namespace collision
2016-12-28T22:26:32.844452+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('tmp687240', 0, 'mac_tx_swlane0'): namespace collision
2016-12-28T22:26:32.844455+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('tmp687240', 0, 'mac_rx_swlane0_fanout0'): namespace collision
2016-12-28T22:26:32.844817+00:00 testsos mac: [ID 469746 kern.info] NOTICE: vnic1021 registered
2016-12-28T22:26:32.844823+00:00 testsos unix: [ID 665567 kern.warning] WARNING: kstat_create('link', 0, 'tmp687240'): namespace collision
2016-12-28T22:26:32.844827+00:00 testsos mac: [ID 435574 kern.info] NOTICE: vnic1021 link up, 1000 Mbps, unknown duplex
2016-12-28T22:26:32.858731+00:00 testsos mac: [ID 736570 kern.info] NOTICE: vnic1021 unregistered

As you can see from the messages there was a collision in kstat
creation and this causes the system to attempt to undo the partial
vnic creation (that's the "vnic unregistered" message). This undo
process, however, leaves the various datalink lists in an
inconcsistent state as shown below.

[root@testsos ~]# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
net0        vnic      1500   ?        --         e1000g0
[root@testsos ~]# echo "::walk dls_devnet_cache | ::print dls_devnet_t dd_linkid dd_linkname" | mdb -k
dd_linkid = 0x3
dd_linkname = [ "tmp687240" ]
dd_linkid = 0x2
dd_linkname = [ "net0" ]
dd_linkid = 0x1
dd_linkname = [ "e1000g0" ]
[root@testsos ~]# echo "ffffff01a009bc00::modhash -e | ::print vnic_t vn_id" | mdb -k
vn_id = 0x2
vn_id = 0x3
[root@testsos ~]# echo "dlmgmt_id_avl::walk avl | ::print dlmgmt_link_t ll_linkid ll_link" | mdb -p $(pgrep dlmgmtd)
ll_linkid = 0x1
ll_link = [ "e1000g0" ]
ll_linkid = 0x2
ll_link = [ "net0" ]
ll_linkid = 0x4
ll_link = [ "tmp687240" ]
[root@testsos ~]# echo "ffffff01b053d740::print zone_t zone_dl_list | ::walk list | ::print zone_dl_t zdl_id" | mdb -k
zdl_id = 0x4
zdl_id = 0x3
zdl_id = 0x2

After the creation fails the state should be the same as before but
instead it's wrong in several ways:

This final point is problematic when stopping the zone because the
vnic deletion logic is being moved into zoneadmd and making use of the
zone's datalink list. If this list is incorrect then zoneadmd will try
to delete links that it shouldn't (either they don't exist or the id
is now mapped to another link, potentially one belonging to another
zone that it shouldn't delete). This issue is demonstrated below.

[root@testsos ~]# vmadm stop e0b64261-4708-c1f3-971b-dd561161b3dc
timed out waiting for zone to transition to installed
[root@testsos ~]# tail -f /var/adm/messages
2016-12-28T22:29:00.116629+00:00 testsos pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
2016-12-28T22:29:00.116671+00:00 testsos genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
2016-12-28T22:29:00+00:00 testsos zoneadmd[103303]: [ID 702911 daemon.error] [zone 'e0b64261-4708-c1f3-971b-dd561161b3dc'] failed to delete network interface due to failure to get link info
2016-12-28T22:29:00+00:00 testsos zoneadmd[103303]: [ID 702911 daemon.error] [zone 'e0b64261-4708-c1f3-971b-dd561161b3dc'] unable to delete transient vnics in zone
2016-12-28T22:29:00+00:00 testsos zoneadmd[103303]: [ID 702911 daemon.error] [zone 'e0b64261-4708-c1f3-971b-dd561161b3dc'] unable to destroy zone

Comment by Ryan Zezeski [X]
Created at 2017-01-04T14:47:09.000Z

The root cause is a logic error in dlmgmt_setzoneid().

While running my above test scenario I noticed that dlmgmtd aborts on
a call to avl_add().

[root@testsos ~]# echo "::status" | mdb /cores/core.dlmgmtd.31
debugging core file of dlmgmtd (32-bit) from testsos
file: /sbin/dlmgmtd
initial argv: /sbin/dlmgmtd
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=31 uid=0 code=-1
panic message:
assertion failed for thread 0xfec20240, thread-id 2: avl_find() succeeded inside
 avl_add(), file /root/smartos-live/projects/illumos/usr/src/common/avl/avl.c, l
ine 649

[root@testsos ~]# echo "\$C" | mdb /cores/core.dlmgmtd.31
fed2e94c libc.so.1`_lwp_kill+0x15(fed2e974, fed2e974, a8, fed2ea18, 2, 8)
fed2ecac libc.so.1`_assfail+0x25d(fed71654, fed71614, 289, fed71654, fed2ed0c)
fed2eccc libc.so.1`assfail+0x2b(fed71654, fed71614, 289, 806da84)
fed2ed0c libavl.so.1`avl_add+0x48(806ab5c, 806c960, fed2ed2c, fef61000)
fed2ed4c dlmgmt_setzoneid+0x156(fed2edf4, fed2ed80, fed2eda8, 0, 806da58,
fed2ed80)
fed2edcc dlmgmt_handler+0x96(0, fed2edf4, c, 0, 0, 8055186)
00000000 libc.so.1`__door_return+0x4b()

This is how it plays out in dls_devnet_setzid() and
dlmgmt_setzoneid(), notice the i_dls_devnet_setzid() upcall
returns with EINTR since dlmgmtd crashed.

[root@testsos ~]# dtrace -Fqn 'fbt::dls_devnet_setzid:entry { self->t=1; } fbt:dls::entry /self->t/ { printf("\n"); } fbt:dls::return /self->t/ { printf("=> %d (0x%p)\n", arg1, arg1); } fbt::dls_devnet_setzid:return /self->t/ { self->t=0; }'
CPU FUNCTION
  2  -> dls_devnet_setzid
  2    -> dls_devnet_hold
  2      -> dls_devnet_hold_common
  2        -> dls_mgmt_get_phydev
  2          -> i_dls_mgmt_get_linkattr
  2            -> i_dls_mgmt_upcall
  2              -> i_dls_mgmt_door_revoked
  2              <- i_dls_mgmt_door_revoked   => 0 (0x0)
  2            <- i_dls_mgmt_upcall           => 2 (0x2)
  2          <- i_dls_mgmt_get_linkattr       => 2 (0x2)
  2        <- dls_mgmt_get_phydev             => 22 (0x16)
  2      <- dls_devnet_hold_common            => 0 (0x0)
  2    <- dls_devnet_hold                     => 0 (0x0)
  2    -> i_dls_devnet_setzid
  2      -> i_dls_mgmt_upcall
  2        -> i_dls_mgmt_door_revoked
  2        <- i_dls_mgmt_door_revoked         => 0 (0x0)
  2      <- i_dls_mgmt_upcall                 => 4 (0x4)
  2    <- i_dls_devnet_setzid                 => 4 (0x4)
  2    -> dls_devnet_rele
  2    <- dls_devnet_rele                     => 1 (0x1)
  2  <- dls_devnet_setzid                     => 4 (0x4)

[root@testsos ~]# dtrace -Fqn 'pid$target::dlmgmt_setzoneid:entry { self->t=1; } pid$target:dlmgmtd::entry,pid$target::link_by_name:entry /self->t/ { printf("\n"); } pid$target:dlmgmtd::return,pid$target::link_by_name:return /self->t/ { printf("=> %d (0x%p)\n", arg1, arg1); } pid$target::dlmgmt_setzoneid:return /self->t/ { self->t=0; }' -p $(pgrep dlmgmtd)
CPU FUNCTION
  2  -> dlmgmt_setzoneid
  2    -> dlmgmt_table_lock
  2      -> dlmgmt_table_readwritelock
  2      <- dlmgmt_table_readwritelock        => 0 (0x0)
  2    <- dlmgmt_table_lock                   => 0 (0x0)
  2    -> link_by_id
  2      -> cmp_link_by_id
  2      <- cmp_link_by_id                    => 4294967295 (0xffffffff)
  2      -> cmp_link_by_id
  2      <- cmp_link_by_id                    => 1 (0x1)
  2      -> cmp_link_by_id
  2      <- cmp_link_by_id                    => 4294967295 (0xffffffff)
  2      -> cmp_link_by_id
  2      <- cmp_link_by_id                    => 0 (0x0)
  2    <- link_by_id                          => 134674200 (0x806f718)
  2    -> dlmgmt_checkprivs
  2    <- dlmgmt_checkprivs                   => 0 (0x0)
  2    -> cmp_link_by_zname
  2    <- cmp_link_by_zname                   => 1 (0x1)
  2    -> cmp_link_by_zname
  2      -> cmp_link_by_name
  2      <- cmp_link_by_name                  => 1 (0x1)
  2    <- cmp_link_by_zname                   => 1 (0x1)
  2    -> cmp_link_by_zname
  2      -> cmp_link_by_name
  2      <- cmp_link_by_name                  => 0 (0x0)
  2    <- cmp_link_by_zname                   => 0 (0x0)

When adding an item to the tree it first asserts that it doesn't
already exist. In this case it does. Why?

Inside dlmgmt_setzoneid() there is logic to determine if a link
with the given name is already assigned to the target zone:

	/*
	 * Before we remove the link from its current zone, make sure that
	 * there isn't a link with the same name in the destination zone.
	 */
	if (zoneid != GLOBAL_ZONEID &&
	    link_by_name(linkp->ll_link, newzoneid) != NULL) {
		err = EEXIST;
		goto done;
	}

This check should catch the assignment of the duplicate vnic and
return EEXIST but it doesn't because of the GZ check. This check
is wrong and should be removed. There is a check before this that
prevents an NGZ from running this function:

	/* We currently only allow changing zoneid's from the global zone. */
	if (zoneid != GLOBAL_ZONEID) {
		err = EACCES;
		goto done;
	}

Since an NGZ can never execute this function the link_by_name()
check will never execute and the avl_add() that follows later in
the function will always abort when it tries to add a duplicate vnic
name.


Comment by Ryan Zezeski [X]
Created at 2017-01-26T01:41:51.000Z

This fix should be upstreamed to gate.


Comment by Bot Bot [X]
Created at 2017-01-31T15:48:32.000Z

illumos-joyent commit 0e21e95 (branch master, by Ryan Zezeski)

OS-1457 dladm won't show or create vnics
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>