ZAPI-763: Connections refused by VMAPI on first boot

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-02-08T00:32:57.000Z)

Fix Versions

2017-02-16 JAMAICA DECOY (Release Date: 2017-02-16)

Description

Connections to VMAPI from outside of the VMAPI zone appear to be refused on first boot. However this behavior is not always reproducible. That said, it is always fixable by restarting the VMAPI service itself.

I uploaded a core of the VMAPI process while it is in the bad state and it can be debugged with thoth: {noformat}thoth debug 32fc7923fc5c3c6c3bd2e6b3e53270b5{noformat}

Comments

Comment by Julien Gilli [X]
Created at 2017-01-26T20:00:43.000Z
From @richard.kiene:

If I do `cat `svcs -L vmapi` | bunyan --color` when that is happening, I get nothing.

Comment by Josh Wilsdon
Created at 2017-01-26T20:16:41.000Z
Updated at 2018-05-07T17:51:39.301Z
it appears that the process is stuck connecting to moray:

> ::findjsobjects -p operations -p successes | ::findjsobjects | ::jsprint -d 4
{
    "operations": [
        {
            "func": function connectToMoray,
            "funcname": "connectToMoray",
            "status": "pending",
        },
        {
            "func": function connectToWfApi,
            "funcname": "connectToWfApi",
            "status": "ok",
        },
    ],
    "successes": [
        undefined,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
    ],
    "ndone": 1,
    "nerrors": 0,
}
{
    "operations": uninitialized,
    "successes": [],
    "ndone": 0,
    "nerrors": 0,
}
>

Comment by Todd Whiteman
Created at 2017-02-02T00:56:53.000Z
Here is my thoth core from today:
THOTH_USER=thoth thoth debug 0caccae6fc594b16af387775f70b03fc


Comment by Josh Wilsdon
Created at 2017-02-02T00:57:19.000Z
Updated at 2018-05-07T17:51:44.445Z
Hit what appears to be this same issue again:

[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]# netstat -an | grep LISTEN
[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]# svcs -p vmapi
STATE          STIME    FMRI
online         23:58:51 svc:/smartdc/site/vmapi:default
               23:58:10    17402 node
[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]# ptree 17402
7201  zsched
  7541  /sbin/init
    17402 /opt/smartdc/vmapi/build/node/bin/node --abort_on_uncaught_exception /opt/smart
[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]# pfiles 17402
17402:  /opt/smartdc/vmapi/build/node/bin/node --abort_on_uncaught_exception /
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:561,23 ino:1840805595 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,65591 ino:66146 uid:0 gid:0 size:20826697
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
      /var/svc/log/smartdc-site-vmapi:default.log
      offset:20826697
   2: S_IFREG mode:0644 dev:90,65591 ino:66146 uid:0 gid:0 size:20826697
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
      /var/svc/log/smartdc-site-vmapi:default.log
      offset:20826697
   3: S_IFIFO mode:0000 dev:558,0 ino:10635 uid:60001 gid:60001 rdev:0,0
      O_RDWR FD_CLOEXEC
   4: S_IFIFO mode:0000 dev:558,0 ino:10635 uid:60001 gid:60001 rdev:0,0
      O_RDWR FD_CLOEXEC
   5: S_IFPORT mode:0000 dev:572,0 uid:60001 gid:60001 size:0
   6: S_IFIFO mode:0000 dev:558,0 ino:10636 uid:60001 gid:60001 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   7: S_IFIFO mode:0000 dev:558,0 ino:10636 uid:60001 gid:60001 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   8: S_IFIFO mode:0000 dev:558,0 ino:10637 uid:60001 gid:60001 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   9: S_IFIFO mode:0000 dev:558,0 ino:10637 uid:60001 gid:60001 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
  10: S_IFSOCK mode:0666 dev:569,0 ino:50844 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 40201
        peername: AF_INET 10.192.0.17  port: 2020
  11: S_IFDOOR mode:0444 dev:571,0 ino:139 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[12919]
  12: S_IFSOCK mode:0666 dev:569,0 ino:50799 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 41924
        peername: AF_INET 10.192.0.17  port: 2020
  13: S_IFSOCK mode:0666 dev:569,0 ino:51035 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 44616
        peername: AF_INET 10.192.0.17  port: 2020
  14: S_IFCHR mode:0666 dev:561,23 ino:1840805595 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /dev/null
      offset:0
  15: S_IFSOCK mode:0666 dev:569,0 ino:51033 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 52295
        peername: AF_INET 10.192.0.17  port: 2020
  16: S_IFSOCK mode:0666 dev:569,0 ino:51032 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 32811
        peername: AF_INET 10.192.0.17  port: 2020
  17: S_IFSOCK mode:0666 dev:569,0 ino:51016 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 36386
        peername: AF_INET 10.192.0.17  port: 2020
  18: S_IFSOCK mode:0666 dev:569,0 ino:50989 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 35607
        peername: AF_INET 10.192.0.17  port: 2020
  19: S_IFSOCK mode:0666 dev:569,0 ino:50988 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 50117
        peername: AF_INET 10.192.0.17  port: 2020
  20: S_IFSOCK mode:0666 dev:569,0 ino:50969 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 64954
        peername: AF_INET 10.192.0.17  port: 2020
  21: S_IFSOCK mode:0666 dev:569,0 ino:1174 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 54564
        peername: AF_INET 10.192.0.17  port: 2020
  22: S_IFSOCK mode:0666 dev:569,0 ino:41696 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 57876
        peername: AF_INET 10.192.0.19  port: 80
  23: S_IFSOCK mode:0666 dev:569,0 ino:11279 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.192.0.27  port: 54653
        peername: AF_INET 10.192.0.19  port: 80
[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]# gcore 17402
gcore: core.17402 dumped
[root@a5ac4681-bf75-4cb1-b106-694115e22961 (coal:vmapi0) ~]#

That core has been uploaded as 7d94d91bda7fad6eca99724a0579b227 in thoth.

Comment by Josh Wilsdon
Created at 2017-02-02T01:01:28.000Z
Updated at 2018-05-07T17:51:44.497Z
fwiw: I've not experienced this on first boot, but I've hit it on reboot.

Comment by Julien Gilli [X]
Created at 2017-02-02T01:30:10.000Z
After reproducing the problem by rebooting my Coal, curl is not able to connect to port 80 in the VMAPI zone:

[root@headnode (coal) ~]# sdc-vmapi /vms
curl: (7) Failed to connect to vmapi.coal.joyent.us port 80: Connection refused
[root@headnode (coal) ~]# 

There is indeed no socket accepting connections on that port in the VMAPI zone:
[root@headnode (coal) ~]# zlogin $(vmadm lookup alias=vmapi0) netstat -an

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.99.99.27.55631    10.99.99.11.2181     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.55156    10.99.99.19.80       1049792      0 1049800      0 ESTABLISHED
10.99.99.27.53301    10.99.99.19.80       1049792      0 1049800      0 ESTABLISHED
10.99.99.27.62014    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.55658    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.39324    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.50470    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.52567    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.49646    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.36118    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.65417    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.39832    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED
10.99.99.27.52072    10.99.99.17.2020     1049792      0 1049800      0 ESTABLISHED

Active UNIX domain sockets
Address  Type          Vnode     Conn  Local Addr      Remote Addr
ffffff04ff8b1798 stream-ord ffffff0500938b80 0000000 /var/run/.smartdc-amon.sock                
ffffff04ff8b9790 stream-ord 0000000 0000000                               
ffffff04ff8b9b50 stream-ord 0000000 0000000                               
ffffff04ff894008 stream-ord 0000000 0000000                               
ffffff04ff8943c8 stream-ord 0000000 0000000                               
ffffff04ff894788 stream-ord 0000000 0000000                               
ffffff04ff894b48 stream-ord 0000000 0000000                               
ffffff04ec673780 stream-ord 0000000 0000000                               
ffffff04ec378880 stream-ord 0000000 0000000                               
ffffff04da3ba468 stream-ord 0000000 0000000                               
ffffff04da3babe8 stream-ord 0000000 0000000                               
ffffff04d9d7e458 stream-ord 0000000 0000000                               
ffffff04d9d7ebd8 stream-ord 0000000 0000000                               
ffffff04cc742000 stream-ord 0000000 0000000                               
ffffff04cc742b40 stream-ord 0000000 0000000                               
ffffff04cc746c40 stream-ord 0000000 0000000                               
ffffff04cab550f8 stream-ord 0000000 0000000                               
ffffff04c903b870 stream-ord 0000000 0000000                               
ffffff04c903bc30 stream-ord 0000000 0000000                               
ffffff0412fe6c08 stream-ord 0000000 0000000                               
ffffff04114980b0 stream-ord 0000000 0000000                               
ffffff0411498bf0 stream-ord 0000000 0000000                               
ffffff041142d468 stream-ord 0000000 0000000                               
ffffff041139e818 stream-ord 0000000 0000000                               
ffffff0411cdb060 stream-ord 0000000 0000000                               
ffffff04112673f0 stream-ord 0000000 0000000                               
ffffff0411f4d7a0 stream-ord 0000000 0000000                               
ffffff04108e3008 stream-ord 0000000 0000000                               
ffffff040e4a04b8 stream-ord 0000000 0000000                               
ffffff040dadec20 stream-ord 0000000 0000000                               
ffffff040b706480 stream-ord 0000000 0000000                               
[root@headnode (coal) ~]#

The following entries are the latest log entries logged by the moray client:

[root@bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e (coal:vmapi0) ~]# cat `svcs -L vmapi` | grep moray | bunyan
[2017-02-02T01:11:24.808Z]  INFO: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=0, delay=10)
[2017-02-02T01:11:28.830Z]  WARN: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=1, delay=20)
[2017-02-02T01:11:28.862Z]  WARN: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=2, delay=40)
[2017-02-02T01:11:32.914Z]  WARN: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=3, delay=80)
[2017-02-02T01:11:33.010Z]  WARN: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=4, delay=160)
[2017-02-02T01:11:33.180Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=5, delay=320)
[2017-02-02T01:11:38.511Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=6, delay=640)
[2017-02-02T01:11:39.169Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=7, delay=1280)
[2017-02-02T01:11:40.459Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=8, delay=2560)
[2017-02-02T01:11:43.040Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: dns: resolve attempted (domain=moray.coal.joyent.us, attempt=9, delay=5120)
[2017-02-02T01:11:48.186Z]  INFO: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: moray: connected (moray="[object MorayClient<host=moray.coal.joyent.us>]")
[2017-02-02T01:11:48.197Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: moray client error (err.code=ECONNRESET)
    Error: read ECONNRESET
        at exports._errnoException (util.js:907:11)
        at TCP.onread (net.js:557:26)
[2017-02-02T01:11:48.204Z] ERROR: moray/20992 on bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e: Buckets were not loaded
    ConnectionClosedError: the underlying connection has been closed
        at Socket.<anonymous> (/opt/smartdc/vmapi/node_modules/moray/node_modules/fast/lib/client.js:451:29)
        at emitOne (events.js:82:20)
        at Socket.emit (events.js:169:7)
        at TCP._onclose (net.js:486:12)
[root@bd1d2f74-c2d3-46fc-8b7d-0fb30c512a1e (coal:vmapi0) ~]#

which seems to indicate that the moray client was able to connect to the moray service, and then got its connection reset.

It is possible that the changes in https://github.com/joyent/sdc-vmapi/commit/e596dfd63dfce50a62d1ad00cbfb4a7fb95f6488, which introduced a subtle change in behavior with regards to how the VMAPI service is initialized, introduced this regression.

I'm still investigating, hopefully I can update this ticket soon.

Comment by Julien Gilli [X]
Created at 2017-02-02T02:19:47.000Z
After generating a core file from a process stuck in that state, getting the addresses for the current pending vasync operations we get:

> ::findjsobjects -p operations -p successes | ::findjsobjects | ::jsprint -ab -d 4
81293045: 81293045: {
    "operations": 81293045: a3552421: [
        81293045: 9f053fc1: {
            "func": 81293045: fd6d0315: function connectToMoray,
            "funcname": 81293045: 87e2fb09: "connectToMoray",
            "status": 81293045: fd620849: "pending",
        },
        81293045: 8127f5b1: {
            "func": 81293045: 819bfb4d: function connectToWfApi,
            "funcname": 81293045: 87e2fb69: "connectToWfApi",
            "status": 81293045: 84e0f1b9: "ok",
        },
    ],
    "successes": 81293045: a3552431: [
        81293045: 9f0080dd: undefined,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
        81293045: 9f0080bd: hole,
    ],
    "ndone": 81293045: 2: 1,
    "nerrors": 81293045: 0: 0,
}
fd6389e9: fd6389e9: {
    "operations": fd6389e9: 9f008165: uninitialized,
    "successes": fd6389e9: fd638a05: [],
    "ndone": fd6389e9: 0: 0,
    "nerrors": fd6389e9: 0: 0,
}

Looking at the connectToMoray closure we find the morayApi instance:

> fd6d0315::jsclosure
    "config": b121d2cd: {
        "logLevel": b1260e2d: "debug",
        "maxSockets": c8: 100,
        "useVmAgent": 9f008115: true,
        "ufdsAdminUuid": b1260e41: "930896af-bf8c-48d4-885c-6573a94b1853",
        "reserveKvmStorage": a6af32f5: [...],
        "api": a6af3301: [...],
        "overlay": a6af330d: [...],
        "wfapi": a6af3329: [...],
        "cnapi": a6af3345: [...],
        "imgapi": a6af3361: [...],
        "napi": b122e119: "9.0.0",
        "papi": 9f0080dd: undefined,
    }
    "vmapiLog": b121d25d: {
        "domain": 9f008081: null,
        "_events": b1260d85: [...],
        "_eventsCount": 0: 0,
        "_maxListeners": 9f0080dd: undefined,
        "_level": b1260da1: 20,
        "streams": b1260dad: [...],
        "serializers": b1260dbd: [...],
        "src": 9f008135: false,
        "fields": b1260dd9: [...],
        "haveNonRawStreams": 9c9081e1: <unknown JavaScript object type "Map">,
    }
    "changefeedOptions": b1260e9d: {
        "resources": a6af336d: [...],
        "moray": a6af337d: [...],
        "maxAge": f0: 120,
        "log": a6af3399: [...],
    }
    "morayApi": b1260eb9: {
        "domain": 9f008081: null,
        "_events": a6af3409: [...],
        "_eventsCount": 2: 1,
        "_maxListeners": 9f0080dd: undefined,
        "log": a6af3425: [...],
        "options": a6af3345: [...],
        "connection": fd6d1611: [...],
    }
    "apiClients": fd6d0205: {
        "cnapi": b121d20d: [...],
        "imgapi": b121d1bd: [...],
        "napi": 9db69905: [...],
        "papi": 9db69939: [...],
        "wfapi": 9db6996d: [...],
    }
>

And that instance has the following node-fast pool:

> b1260eb9::jsprint connection.pool        
{
    "domain": null,
    "_events": {
        "error": function <anonymous> (as <anon>),
    },
    "_eventsCount": 1,
    "_maxListeners": undefined,
    "log": {
        "domain": null,
        "_events": [...],
        "_eventsCount": 0,
        "_maxListeners": undefined,
        "_isSimpleChild": true,
        "_level": 30,
        "streams": [...],
        "serializers": [...],
        "src": false,
        "fields": [...],
    },
    "port": 2020,
    "max": 10,
    "connectTimeout": 200,
    "retry": {
        "retries": Inf,
        "minTimeout": 500,
        "maxTimeout": 16000,
    },
    "ring": {
        "domain": null,
        "_events": [...],
        "_eventsCount": 4,
        "_maxListeners": undefined,
        "_members": [...],
        "_active": [...],
        "_next": 1,
        "priorityFunc": undefined,
    },
}
> 

where we can see that the retry options specify infinite retries:

"retry": {
        "retries": Inf,
        "minTimeout": 500,
        "maxTimeout": 16000,
    },

The corresponding node-fast ring has one active connection:
> b1260eb9::jsprint connection.pool.ring
{
    "domain": null,
    "_events": {
        "online": function <anonymous> (as <anon>),
        "offline": function <anonymous> (as <anon>),
    },
    "_eventsCount": 4,
    "_maxListeners": undefined,
    "_members": {
        "10.99.99.17": [...],
    },
    "_active": [
        "10.99.99.17",
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
        hole,
    ],
    "_next": 1,
    "priorityFunc": undefined,
}
> 
to 10.99.99.17, which is the moray instance:
[root@headnode (coal) ~]# vmadm list -o alias,nics.0.ip | grep 10.99.99.17
moray0      10.99.99.17
[root@headnode (coal) ~]#

In that moray instance, we see ESTABLISHED connections from the VMAPI zone:
[root@headnode (coal) ~]# zlogin $(vmadm lookup alias=moray0) netstat -an | grep $(vmadm list -o alias,nics.0.ip | grep vmapi0 | tr -s ' ' | cut -d ' ' -f 2)
10.99.99.17.2020     10.99.99.27.62014    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.55658    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.39324    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.50470    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.52567    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.49646    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.36118    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.65417    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.39832    1049792      0 1049800      0 ESTABLISHED
10.99.99.17.2020     10.99.99.27.52072    1049792      0 1049800      0 ESTABLISHED
[root@headnode (coal) ~]#

Comment by Julien Gilli [X]
Created at 2017-02-02T23:39:16.000Z
Looking further at the members in the node-fast ring:

> b1260eb9::jsprint -ba connection.pool.ring._members  
b1260eb9: a355245d: {
    "10.99.99.17": b1260eb9: 9db2a6c1: {
        "active": b1260eb9: fd67d021: <unknown JavaScript object type "AccessorPair">,
        "_active": b1260eb9: 84e16d7d: [...],
        "members": b1260eb9: fd67cfe1: <unknown JavaScript object type "AccessorPair">,
        "_events": b1260eb9: 84e16d8d: [...],
        "domain": b1260eb9: 9f008081: null,
        "_next": b1260eb9: 2: 1,
        "_maxListeners": b1260eb9: 9f0080dd: undefined,
        "_members": b1260eb9: 84e16da9: [...],
        "priorityFunc": b1260eb9: 812f9b6d: function leastRequestsPriority,
        "_eventsCount": b1260eb9: 4: 2,
    },
}
> 84e16da9::jsprint -d 3
{
    "0": {
        "domain": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "_eventsCount": 3,
        "_maxListeners": undefined,
        "fast_msgid": 1,
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_connect_timeout": null,
        "fast_requests": {
            "1": [...],
        },
        "_pending_requests": 1,
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "closed": false,
        "_fast_retry": null,
        "_fast_pending_conn": null,
        "log": {                      
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
    },
    "1": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37075,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "2": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37077,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "3": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "4": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "5": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "6": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "7": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "8": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37077,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
    "9": {
        "closed": false,
        "countPending": <unknown JavaScript object type "AccessorPair">,
        "_pending_requests": 0,
        "_fast_pending_conn": null,
        "_events": {
            "connect": function <anonymous> (as <anon>),
            "close": function <anonymous> (as <anon>),
            "error": function <anonymous> (as <anon>),
        },
        "log": {
            "domain": null,
            "_events": [...],
            "_eventsCount": 0,
            "_maxListeners": undefined,
            "_level": 30,
            "streams": [...],
            "serializers": [...],
            "src": false,
            "fields": [...],
            "haveNonRawStreams": undefined,
        },
        "domain": null,
        "fast_msgid": 0,
        "_fast_connect_timeout": null,
        "fast_reconnect": function <anonymous> (as Client.fast_reconnect),
        "_maxListeners": undefined,
        "fast_timer": {
            "_called": true,
            "_idleTimeout": 1000,
            "_idlePrev": null,
            "_idleNext": null,
            "_idleStart": 37076,
            "_onTimeout": function <anonymous> (as <anon>),
            "_repeat": null,
        },
        "messageEncoder": {
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 2,
            "_maxListeners": undefined,
            "_outbound": [...],
        },
        "_options": {
            "host": "10.99.99.17",
            "port": 2020,
            "connectTimeout": 200,
            "retry": [...],
        },
        "fast_conn": {
            "_connecting": false,
            "_hadError": false,
            "_handle": [...],
            "_parent": null,
            "_host": null,
            "_readableState": [...],
            "readable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 8,
            "_maxListeners": undefined,
            "_writableState": [...],
            "writable": true,
            "allowHalfOpen": false,
            "destroyed": false,
            "_bytesDispatched": 0,
            "_sockname": null,
            "_pendingData": null,
            "_pendingEncoding": "",
            "server": null,
            "_server": null,
            "_consuming": true,
            "_idleNext": undefined,
            "_idlePrev": undefined,
        },
        "_fast_retry": null,
        "fast_requests": {},
        "_eventsCount": 3,
        "messageDecoder": {
            "_writableState": [...],
            "writable": true,
            "domain": null,
            "_events": [...],
            "_eventsCount": 6,
            "_maxListeners": undefined,
            "_buf": null,
            "_msg": null,
        },
    },
}
> 

So it seems that none of the ten potential connections in the ring was actually established. Initially,I thought that the ESTABLISHED connections we were seeing previously with netstat -an were established by VMAPI's changefeed publisher instead of the moray client used to implement VMAPI's database layer, but it doesn't seem that at that point the changefeed publisher is initialized, or that even a VMAPI application instance was created :

> ::findjsobjects -p changefeedPublisher
> ::findjsobjects -c VMAPI
>

Comment by Julien Gilli [X]
Created at 2017-02-03T02:16:09.000Z
Updated at 2017-02-03T02:20:10.000Z
It seems that the connections from VMAPI's database layer's moray client were actually established.

The problem is that, once node-moray establishes a connection to the moray service, it removes the event listener on the 'online' event emitted by its ConnectionPool through the use of {{once}} instead of {{on}}.

This is confirmed by looking at the only MorayClient's ConnectionPool instance in the core file, where there is no event listener for the 'online' event anymore:

> ::findjsobjects -c MorayClient | ::findjsobjects | ::jsprint pool._events
{
    "error": function <anonymous> (as <anon>),
}
> 

To summarize, this is what happens:

1. The MorayClient emits the connect event and removes its listener on the underlying ConnectionPool 'online' event.
2. The connection that was just established is closed, and the {{_setupBucket}} method is called, which fails due to the connection being closed.
3. The 'connect' event will never be emitted again by the moray client. Thus, the _setupBucket method will never be called again, and the 'moray-ready' event will never be emitted.

This is a bug that was present before the change made for ZAPI-757. However before that change, the symptom would be that VMAPI's moray buckets would not be setup, but the service would still be initialized and HTTP requests would be served normally.

What changed with ZAPI-757 is that, in that failure mode, the VMAPI initialization process never completes and HTTP requests can't be served.

VMAPI's changefeed publisher doesn't have that problem because it sets up its own infinite backoff retry mechanism, and so in that case, it eventually manages to initialize its buckets once the underlying moray client connection is re-established.

This could be one way to solve that problem for VMAPI's database layer, even though it wouldn't fix the cause of the problem.

Another solution that seems would align better with our long term goals would be to use a node-moray version that uses cueball, which potentially doesn't have the same problem as the one currently used by VMAPI.



Comment by Julien Gilli [X]
Created at 2017-02-07T22:47:30.000Z
CR submitted at https://cr.joyent.us/1467. I decided to go the backoff route:

1. I didn't want to introduce a change as large and as potentially breaking as migrating to a node-moray client that uses cueball.
2. Regardless of the migration to a node-moray client that manages reconnections better, the moray buckets setup process would need to be retried in case of failures.

This CR was tested by rebooting my COAL many times and making sure that VMAPI was up and able to serve requests after reboot was complete. Each time, I also took a look at VMAPI's logs to make sure that the backoff process and the buckets setup process were behaving as expected.

Comment by Bot Bot [X]
Created at 2017-02-08T00:32:30.000Z

sdc-vmapi commit 7defce1 (branch master, by Julien Gilli)

ZAPI-763 Connections refused by VMAPI on first boot
    Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
    Reviewed by: Trent Mick <trent.mick@joyent.com>
    Approved by: Julien Gilli <julien.gilli@joyent.com>