MORAY-398: moray caught connecting to deposed peer

Details

Issue Type:Bug
Priority:3 - Elevated
Status:Resolved
Created at:2017-02-08T17:43:24.000Z
Updated at:2019-04-22T23:44:27.714Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-02-27T18:41:45.000Z)

Related Issues

Description

After some major networking issues, we were getting alarms like this one:

{
  "v": 1,
  "type": "probe",
  "user": "8a70155f-752f-eb91-a755-f4edbee5c99f",
  "probeUuid": "41b83717-42b5-4ad2-b71e-96533b805e12",
  "clear": false,
  "data": {
    "message": "Command failed (exit status: 1).",
    "value": 3,
    "details": {
      "cmd": "/bin/bash -c 'if [[ -f /opt/smartdc/minnow/bin/check-minnow ]]; then /opt/smartdc/minnow/bin/check-minnow; exit $?; fi; let delta=$(date +%s)-$(PATH=/opt/smartdc/minnow/build/node/bin:/opt/smartdc/minnow/node_modules/.bin:$PATH findobjects -h $(cat /opt/smartdc/minnow/etc/config.json | json moray.host) manta_storage hostname=$(hostname)* | json -e _mtime=_mtime/1000 -e _mtime=~~_mtime _mtime) ; test $delta -lt 900'",
      "exitStatus": 1,
      "signal": null,
      "stdout": "",
      "stderr": "check-minnow.js: record \"090e6300-cdd2-413a-b5e6-9e74fe3a996a.stor.us-east.scloud.host\" is too old (found timestamp \"2017-02-08T16:28:36.429Z\", expecting one from at least \"2017-02-08T16:34:05.572Z\"\n"
    }
  },
  "machine": "090e6300-cdd2-413a-b5e6-9e74fe3a996a",
  "uuid": "0504760b-d2a1-ec77-c6bd-c9dd1e74aca7",
  "time": 1486572545589,
  "agent": "090e6300-cdd2-413a-b5e6-9e74fe3a996a",
  "agentAlias": "storage.us-east.scloud.host-090e6300",
  "relay": "00000000-0000-0000-0000-0cc47adec00c"
}

That message reports that the minnow record is too old (about 6 minutes older than the 15-minute grace period). Over on that minnow, I'm seeing these entries in the minnow log:

[2017-02-08T17:03:10.668Z] ERROR: minnow/46973 on 090e6300-cdd2-413a-b5e6-9e74fe3a996a: moray: update failed
    VError: moray client ("10.136.1.41:55825" to "10.128.0.171:2021"): request failed: server error: connect ECONNREFUSED
        at FastClientRequest.<anonymous> (/opt/smartdc/minnow/node_modules/moray/lib/rpc.js:105:19)
        at FastClientRequest.g (events.js:180:16)
        at FastClientRequest.emit (events.js:95:17)
        at Object._onImmediate (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:601:11)
        at processImmediate [as _immediateCallback] (timers.js:363:15)
    Caused by: FastRequestError: request failed: server error: connect ECONNREFUSED
        at FastClient.requestFail (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:582:22)
        at FastClient.onMessage (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:753:7)
        at FastMessageDecoder.onDecoderMessage (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:431:48)
        at FastMessageDecoder.emit (events.js:95:17)
        at FastMessageDecoder.<anonymous> (_stream_readable.js:765:14)
        at FastMessageDecoder.emit (events.js:92:17)
        at emitReadable_ (_stream_readable.js:427:10)
        at emitReadable (_stream_readable.js:423:5)
        at readableAddChunk (_stream_readable.js:166:9)
        at FastMessageDecoder.Readable.push (_stream_readable.js:128:10)
        at FastMessageDecoder.Transform.push (_stream_transform.js:140:32)
    Caused by: FastServerError: server error: connect ECONNREFUSED
        at FastClient.onMessage (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:753:28)
        at FastMessageDecoder.onDecoderMessage (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_client.js:431:48)
        at FastMessageDecoder.emit (events.js:95:17)
        at FastMessageDecoder.<anonymous> (_stream_readable.js:765:14)
        at FastMessageDecoder.emit (events.js:92:17)
        at emitReadable_ (_stream_readable.js:427:10)
        at emitReadable (_stream_readable.js:423:5)
        at readableAddChunk (_stream_readable.js:166:9)
        at FastMessageDecoder.Readable.push (_stream_readable.js:128:10)
        at FastMessageDecoder.Transform.push (_stream_transform.js:140:32)
        at FastMessageDecoder.decode (/opt/smartdc/minnow/node_modules/moray/node_modules/fast/lib/fast_protocol.js:454:8)
    Caused by: Error: connect ECONNREFUSED
        at errnoException (net.js:901:11)
        at Object.afterConnect [as oncomplete] (net.js:892:19)

I misread this at first: it's not minnow getting ECONNREFUSED from moray; rather it's the moray server responding to the RPC request with an error indicating that it (moray) had gotten ECONNREFUSED.

The pid for that moray in that zone is:

[root@d63cfaae (moray) ~]$ svcs -p moray-2021
STATE          STIME    FMRI
online         Feb_06   svc:/smartdc/application/moray:moray-2021
               Feb_06       5374 node

It's not stuck or anything:

[root@d63cfaae (moray) ~]$ pstack 5374
5374:   node main.js -f etc/config.json -p 2021
-----------------  lwp# 1 / thread# 1  --------------------
 fe9819b7 portfs   (6, 5, 8043c04, 400, 1, 8047c04)
 083f1a1b uv__io_poll (8ef54e0, 194, f4240, 0, 8ef54f8, 0) + 18b
 083e5b1a uv_run   (8ef54e0, 0, 8047d14, 0, 5, 8fc8f60) + da
 0828b25d _ZN4node5StartEiPPc (6, 8047d14, 8047cc8, 8047ce8, 82802a4, feffb0a8) + 16d
 0829b97b main     (6, 8047d14, 8047d30, 8047d08, 81efc62, 82802b0) + 1b
 081efcc3 _start   (6, 8047e04, 8047e09, 8047e11, 8047e14, 8047e24) + 83

But here's a big red flag:

[root@d63cfaae (moray) ~]$ moraystat.d 
         ------PG------  --------------MORAY---------------
PID      CONN QLEN  OPS  GETS FIND PUTS UPDS DELS BTCH RIDX
5382        0    0    0     0   15    5    0    0    1    0
5374        0    0    0     0   18    5    0    0    1    0
5378       12    0 1568     0  383    3    0    0    5    0
5380       16    2 1835     0  452    3    0    0    5    0
5382        0    0    0     0   19    3    0    0    1    0
5374        0    0    0     0   19    7    0    0    1    0
5380       13    0 1838     0  437    5    0    0   17    0
5378       16    0 1536     0  367    2    0    0   16    0
5382        0    0    0     0   16    4    0    0    3    0
5374        0    0    0     0   17    5    0    0    3    0
5378       12    0 1486     0  367    4    0    0    0    0
5380       15    0 1741     1  418    7    0    0   15    0
5382        0    0    0     0   11    3    0    0    0    0
5374        0    0    0     0   15    5    0    0    1    0
5378       11    0 1417     0  350    2    0    0    3    0
5380       15    0 1812     1  446    3    0    0    3    0
5382        0    0    0     0   15    5    0    0    0    0
5374        0    0    0     1   17    3    0    0    2    0
5378        8    0 1514     2  358    4    0    0   18    0
5380       15    0 1843     0  441    1    0    0   18    0

Two of the four moray processes are doing tons of work, while the other two are doing very little and have no working connections to PostgreSQL.

Our bum process is spewing this to its log:

[2017-02-08T17:22:44.467Z] FATAL: moray/pgpool/5374 on d63cfaae-4535-40b1-904c-4c5160ea6aa5: unable to create PG client
    Error: connect ECONNREFUSED
        at errnoException (net.js:901:11)
        at Object.afterConnect [as oncomplete] (net.js:892:19)

Unfortunately, it's not logging the IP it's trying to connect to, but we can guess from the netstat output:

[root@d63cfaae (moray) ~]$ netstat -a -f inet -P tcp -n  | grep -v ESTABLISHED

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
      *.2020               *.*                0      0 1048576      0 LISTEN
      *.2023               *.*                0      0 1048576      0 LISTEN
      *.2024               *.*                0      0 1048576      0 LISTEN
      *.2022               *.*                0      0 1048576      0 LISTEN
      *.2021               *.*                0      0 1048576      0 LISTEN
10.128.0.171.36922   10.128.0.139.5432        0      0 1049740      0 SYN_SENT
10.128.0.171.50991   10.128.0.139.5432        0      0 1049740      0 SYN_SENT
10.128.0.171.49023   10.128.0.139.5432        0      0 1049740      0 SYN_SENT
10.128.0.171.47552   10.128.0.139.5432        0      0 1049740      0 SYN_SENT
...

All of the SYN_SENT connections are to 10.128.0.139. From a postgres zone for this shard, we can see that that's the deposed peer for this shard, so ECONNREFUSED is not surprising:

[root@67a2dda3 (postgres) ~]$ manatee-adm peers
ROLE     PEERNAME                             IP              
primary  967a8538-357f-4687-8f8d-266301bfe42e 10.132.0.136    
sync     c5d5239c-d720-42e3-89d8-0abeba0f51f7 10.136.0.148    
deposed  67a2dda3-5e1b-4c75-a522-2dd80d99064d 10.128.0.139    

But why didn't Moray notice that the cluster state has changed?

Comments

Comment by Former user
Created at 2017-02-08T20:25:58.000Z

Do the busted moray processes have established connections to ZooKeeper? I picked pid 5374. Its "pfiles" output shows an fd to one of the ZooKeeper peers:

 524: S_IFSOCK mode:0666 dev:568,0 ino:53053 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_SNDBUF(128000),SO_RCVBUF(1049800)
        sockname: AF_INET 10.128.0.171  port: 44776
        peername: AF_INET 10.132.0.129  port: 2181

Here's that connection in netstat(1) on the Moray side:

[root@d63cfaae (moray) ~]$ netstat -f inet -P tcp -n | grep 10.132.0.129
10.128.0.171.54965   10.132.0.129.2181    1049792      0 1049800      0 ESTABLISHED
10.128.0.171.44776   10.132.0.129.2181    1049792      0 1049800      0 ESTABLISHED
[root@d63cfaae (moray) ~]$ 

But there's no such connection from ZooKeeper:

[root@aad205d9 (nameservice) ~]$ netstat -f inet -P tcp -n -a | grep 10.128.0.171
10.132.0.129.2181    10.128.0.171.54965   1049792      0 1049800      0 ESTABLISHED
[root@aad205d9 (nameservice) ~]$ 

There are log entries from ZooKeeper last night related to this client, though:

    2017-02-08 04:33:17,977 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /10.128.0.171:44776
    2017-02-08 04:33:17,977 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] - Connection request from old client /10.128.0.171:44776; will be dropped if server is in r-o mode
    2017-02-08 04:33:17,977 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /10.128.0.171:44776
    2017-02-08 04:33:17,978 [myid:3] - INFO  [CommitProcessor:3:ZooKeeperServer@604] - Established session 0x359943906c30692 with negotiated timeout 40000 for client /10.128.0.171:44776
    2017-02-08 04:34:00,002 [myid:3] - INFO  [CommitProcessor:3:NIOServerCnxn@1000] - Closed socket connection for client /10.128.0.171:44776 which had sessionid 0x359943906c30692

On the Moray side, there's a gap in log entries from this pid from 2017-02-08T03:39:07.043Z to 2017-02-08T04:40:47.658Z, and we don't know why that is (but moray doesn't log under normal operation either, so we don't know if entries are missing or there just weren't any). But we do have kernel state for this connection. First, a fixed point mapping lbolt to wall clock time:

>  *lb_info::print -t lbolt_info_t; !date
lbolt_info_t {
    union  id = {
        cyclic_id_t lbi_cyclic_id = 0xffffd062de650d40
        int64_t lbi_id_pad = 0xffffd062de650d40
    }
    int64_t lbi_thresh_calls = 0x4b
    int64_t lbi_thresh_interval = 0xa
    int64_t lbi_debug_ts = 0
    int64_t lbi_debug_time = 0x775eb74d
    int64_t lbi_internal = 0x192a17be2
    uint32_t lbi_token = 0
    boolean_t lbi_cyc_deactivate = 0x1 (B_TRUE)
    int64_t lbi_cyc_deac_start = 0x192a17be1
}
February  8, 2017 07:51:28 PM UTC

Now, let's get the kernel state for this file descriptor:

> 0t5374::pid2proc | ::pfiles ! grep -w 524
 524 SOCK ffffdcd1b2285000 socket: AF_INET 10.128.0.171 44776 remote: AF_INET 10.132.0.129 2181

I'm going to dump everything about this for future reference, but only a few of these fields are useful:

> ffffdcd1b2285000::print vnode_t v_data | ::print -t struct sonode so_proto_handle | ::print -t conn_t 
conn_t {
    kmutex_t conn_lock = {
        void *[1] _opaque = [ 0 ]
    }
    uint32_t conn_ref = 0x3
    uint32_t conn_flags = 0x40001001
    union  conn_proto_priv = {
        tcp_t *cp_tcp = 0xffffddf6bb394a40
        struct udp_s *cp_udp = 0xffffddf6bb394a40
        struct icmp_s *cp_icmp = 0xffffddf6bb394a40
        struct rts_s *cp_rts = 0xffffddf6bb394a40
        struct iptun_s *cp_iptun = 0xffffddf6bb394a40
        struct sctp_s *cp_sctp = 0xffffddf6bb394a40
        void *cp_priv = 0xffffddf6bb394a40
    }
    kcondvar_t conn_cv = {
        ushort_t _opaque = 0
    }
    uint8_t conn_proto = 0x6
    edesc_rpf conn_recv = tcp_input_data
    edesc_rpf conn_recvicmp = tcp_icmp_input
    edesc_vpf conn_verifyicmp = tcp_verifyicmp
    ip_xmit_attr_t *conn_ixa = 0xffffdcd1b0df1a00
    crb_t conn_recv_ancillary = {
        union  crbu = {
            uint32_t crbu_all = 0
            struct  crbb = {
                unsigned int:1 crbb_recvdstaddr :1 = 0
                unsigned int:1 crbb_recvopts :1 = 0
                unsigned int:1 crbb_recvif :1 = 0
                unsigned int:1 crbb_recvslla :1 = 0
                unsigned int:1 crbb_recvttl :1 = 0
                unsigned int:1 crbb_ip_recvpktinfo :1 = 0
                unsigned int:1 crbb_ipv6_recvhoplimit :1 = 0
                unsigned int:1 crbb_ipv6_recvhopopts :1 = 0
                unsigned int:1 crbb_ipv6_recvdstopts :1 = 0
                unsigned int:1 crbb_ipv6_recvrthdr :1 = 0
                unsigned int:1 crbb_old_ipv6_recvdstopts :1 = 0
                unsigned int:1 crbb_ipv6_recvrthdrdstopts :1 = 0
                unsigned int:1 crbb_ipv6_recvtclass :1 = 0
                unsigned int:1 crbb_recvucred :1 = 0
                unsigned int:1 crbb_timestamp :1 = 0
            }
        }
    }
    squeue_t *conn_sqp = 0xffffd063194ba6c0
    uint_t conn_state_flags = 0
    int conn_lingertime = 0
    unsigned int:1 conn_on_sqp :1 = 0
    unsigned int:1 conn_linger :1 = 0
    unsigned int:1 conn_useloopback :1 = 0
    unsigned int:1 conn_broadcast :1 = 0
    unsigned int:1 conn_reuseaddr :1 = 0
    unsigned int:1 conn_keepalive :1 = 0
    unsigned int:1 conn_multi_router :1 = 0
    unsigned int:1 conn_unspec_src :1 = 0
    unsigned int:1 conn_policy_cached :1 = 0x1
    unsigned int:1 conn_in_enforce_policy :1 = 0
    unsigned int:1 conn_out_enforce_policy :1 = 0
    unsigned int:1 conn_debug :1 = 0
    unsigned int:1 conn_ipv6_v6only :1 = 0
    unsigned int:1 conn_oobinline :1 = 0
    unsigned int:1 conn_dgram_errind :1 = 0
    unsigned int:1 conn_exclbind :1 = 0
    unsigned int:1 conn_mdt_ok :1 = 0
    unsigned int:1 conn_allzones :1 = 0
    unsigned int:1 conn_ipv6_recvpathmtu :1 = 0
    unsigned int:1 conn_mcbc_bind :1 = 0
    unsigned int:1 conn_reuseport :1 = 0
    unsigned int:11 conn_pad_to_bit_31 :11 = 0
    boolean_t conn_blocked = 0 (0)
    squeue_t *conn_initial_sqp = 0xffffd063194ba6c0
    squeue_t *conn_final_sqp = 0
    ill_t *conn_dhcpinit_ill = 0
    ipsec_latch_t *conn_latch = 0
    struct ipsec_policy_s *conn_latch_in_policy = 0
    struct ipsec_action_s *conn_latch_in_action = 0
    uint_t conn_bound_if = 0
    queue_t *conn_rq = 0
    queue_t *conn_wq = 0              
    dev_t conn_dev = 0
    vmem_t *conn_minor_arena = 0
    ip_helper_stream_info_t *conn_helper_info = 0
    cred_t *conn_cred = 0xffffd063257091a0
    pid_t conn_cpid = 0x14fe
    uint64_t conn_open_time = 0x117fa21e5
    connf_t *conn_g_fanout = 0xffffd0906ef5ea30
    struct conn_s *conn_g_next = 0xffffd1d179e5b840
    struct conn_s *conn_g_prev = 0xffffd0beeebac180
    struct ipsec_policy_head_s *conn_policy = 0
    in6_addr_t conn_bound_addr_v6 = ::
    connf_t *conn_fanout = 0xfffff0ad54c8b300
    struct conn_s *conn_next = 0
    struct conn_s *conn_prev = 0
    struct  connua_v6addr = {
        in6_addr_t connua_laddr = ::ffff:10.128.0.171
        in6_addr_t connua_faddr = ::ffff:10.132.0.129
    }
    in6_addr_t conn_saddr_v6 = ::ffff:10.128.0.171
    union  u_port = {
        uint32_t connu_ports2 = 0xe8ae8508
        struct  connu_ports = {
            in_port_t connu_fport = 0x8508
            in_port_t connu_lport = 0xe8ae
        }
    }
    uint_t conn_incoming_ifindex = 0
    ill_t *conn_oper_pending_ill = 0
    krwlock_t conn_ilg_lock = {
        void *[1] _opaque = [ 0 ]
    }
    ilg_t *conn_ilg = 0
    kcondvar_t conn_refcv = {
        ushort_t _opaque = 0
    }
    struct conn_s *conn_drain_next = 0
    struct conn_s *conn_drain_prev = 0
    idl_t *conn_idl = 0
    mblk_t *conn_ipsec_opt_mp = 0
    zoneid_t conn_zoneid = 0
    int conn_rtaware = 0
    kcondvar_t conn_sq_cv = {
        ushort_t _opaque = 0
    }
    sock_upcalls_t *conn_upcalls = so_upcalls
    sock_upper_handle_t conn_upper_handle = 0xffffdcd5933cf7b0
    unsigned int:2 conn_mlp_type :2 = 0
    unsigned int:1 conn_anon_mlp :1 = 0
    unsigned int:1 conn_anon_port :1 = 0
    unsigned int:2 conn_mac_mode :2 = 0
    unsigned int:1 conn_anon_priv_bind :1 = 0
    unsigned int:1 conn_zone_is_global :1 = 0
    unsigned int:1 conn_isvrrp :1 = 0
    unsigned int:23 conn_spare :23 = 0
    boolean_t conn_flow_cntrld = 0 (0)
    netstack_t *conn_netstack = 0xffffd090ce6ea900
    ushort_t conn_ipversion = 0x4
    sa_family_t conn_family = 0x2
    uint_t conn_so_type = 0x2
    uint_t conn_sndbuf = 0x1f400
    uint_t conn_rcvbuf = 0x1004c8
    uint_t conn_wroff = 0x78
    uint_t conn_sndlowat = 0x3200
    uint_t conn_rcvlowat = 0
    uint8_t conn_default_ttl = 0x40
    uint32_t conn_flowinfo = 0
    in6_addr_t conn_v6lastdst = ::
    ushort_t conn_lastipversion = 0
    in_port_t conn_lastdstport = 0
    uint32_t conn_lastflowinfo = 0
    uint_t conn_lastscopeid = 0
    uint_t conn_lastsrcid = 0
    in6_addr_t conn_v6lastsrc = ::
    ip_pkt_t conn_xmit_ipp = {
        uint_t ipp_fields = 0
        in6_addr_t ipp_addr = ::
        uint_t ipp_unicast_hops = 0x40
        uint_t ipp_hoplimit = 0
        uint_t ipp_hopoptslen = 0
        uint_t ipp_rthdrdstoptslen = 0
        uint_t ipp_rthdrlen = 0
        uint_t ipp_dstoptslen = 0
        uint_t ipp_fraghdrlen = 0
        ip6_hbh_t *ipp_hopopts = 0
        ip6_dest_t *ipp_rthdrdstopts = 0
        ip6_rthdr_t *ipp_rthdr = 0
        ip6_dest_t *ipp_dstopts = 0
        ip6_frag_t *ipp_fraghdr = 0
        uint8_t ipp_tclass = 0
        uint8_t ipp_type_of_service = 0
        uint_t ipp_ipv4_options_len = 0
        uint8_t *ipp_ipv4_options = 0
        uint_t ipp_label_len_v4 = 0
        uint8_t *ipp_label_v4 = 0
        uint_t ipp_label_len_v6 = 0
        uint8_t *ipp_label_v6 = 0
    }
    uint8_t *conn_ht_iphc = 0xffffe3a84d7ac830
    uint_t conn_ht_iphc_allocated = 0x50
    uint_t conn_ht_iphc_len = 0x34
    uint8_t *conn_ht_ulp = 0xffffe3a84d7ac844
    uint_t conn_ht_ulp_len = 0x20
    uint32_t conn_sum = 0
    uint32_t conn_ioctlref = 0
}


> ffffdcd1b2285000::print vnode_t v_data | ::print -t struct sonode so_proto_handle | ::print -t conn_t conn_proto_priv.cp_tcp[] 
tcp_t conn_proto_priv.cp_tcp = {
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_time_wait_next = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_time_wait_prev = 0
    int64_t conn_proto_priv.cp_tcp->tcp_time_wait_expire = 0
    struct conn_s *conn_proto_priv.cp_tcp->tcp_connp = 0xffffddf6bb394740
    tcp_stack_t *conn_proto_priv.cp_tcp->tcp_tcps = 0xffffd0bee63a5200
    int32_t conn_proto_priv.cp_tcp->tcp_state = 0
    int32_t conn_proto_priv.cp_tcp->tcp_rcv_ws = 0x5
    int32_t conn_proto_priv.cp_tcp->tcp_snd_ws = 0x5
    uint32_t conn_proto_priv.cp_tcp->tcp_ts_recent = 0x178f8126
    clock_t conn_proto_priv.cp_tcp->tcp_rto = 0x6c8
    int64_t conn_proto_priv.cp_tcp->tcp_last_rcv_lbolt = 0x117fa25df
    uint32_t conn_proto_priv.cp_tcp->tcp_rto_initial = 0x3e8
    uint32_t conn_proto_priv.cp_tcp->tcp_rto_min = 0x190
    uint32_t conn_proto_priv.cp_tcp->tcp_rto_max = 0xea60
    uint32_t conn_proto_priv.cp_tcp->tcp_snxt = 0xf1ba6d31
    uint32_t conn_proto_priv.cp_tcp->tcp_swnd = 0x1004c0
    uint32_t conn_proto_priv.cp_tcp->tcp_mss = 0x5a8
    uint32_t conn_proto_priv.cp_tcp->tcp_iss = 0xf1ba6d00
    uint32_t conn_proto_priv.cp_tcp->tcp_rnxt = 0xc4418b76
    uint32_t conn_proto_priv.cp_tcp->tcp_rwnd = 0x1004c8
    mblk_t *conn_proto_priv.cp_tcp->tcp_xmit_head = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_xmit_last = 0xffffd0907fb78600
    mblk_t *conn_proto_priv.cp_tcp->tcp_xmit_tail = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_unsent = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_xmit_tail_unsent = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_suna = 0xf1ba6d31
    uint32_t conn_proto_priv.cp_tcp->tcp_rexmit_nxt = 0xf1ba6d31
    uint32_t conn_proto_priv.cp_tcp->tcp_rexmit_max = 0xf1ba6d31
    uint32_t conn_proto_priv.cp_tcp->tcp_cwnd = 0xb50
    int32_t conn_proto_priv.cp_tcp->tcp_cwnd_cnt = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_ibsegs = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_obsegs = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_naglim = 0x1
    uint32_t conn_proto_priv.cp_tcp->tcp_valid_bits = 0
    timeout_id_t conn_proto_priv.cp_tcp->tcp_timer_tid = 0
    uchar_t conn_proto_priv.cp_tcp->tcp_timer_backoff = 0
    int64_t conn_proto_priv.cp_tcp->tcp_last_recv_time = 0x117fa2a51
    uint32_t conn_proto_priv.cp_tcp->tcp_init_cwnd = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_urp_last_valid :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_hard_binding :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_fin_acked :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_fin_rcvd :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_fin_sent :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_ordrel_done :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_detached :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_zero_win_probe :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_loopback :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_localnet :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_syn_defense :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_set_timer :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_active_open :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_rexmit :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_snd_sack_ok :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_hwcksum :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_ip_forward_progress :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_ecn_ok :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_ecn_echo_on :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_ecn_cwr_sent :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_cwr :1 = 0
    unsigned int:11 conn_proto_priv.cp_tcp->tcp_pad_to_bit31 :11 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_snd_ts_ok :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_snd_ws_ok :1 = 0x1
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_reserved_port :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_in_free_list :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_snd_zcopy_on :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_snd_zcopy_aware :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_xmit_zc_clean :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_wait_for_eagers :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_accept_error :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_send_discon_ind :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_cork :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_tconnind_started :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_lso :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_is_wnd_shrnk :1 = 0
    unsigned int:18 conn_proto_priv.cp_tcp->tcp_pad_to_bit_31 :18 = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_initial_pmtu = 0x5dc
    mblk_t *conn_proto_priv.cp_tcp->tcp_reass_head = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_reass_tail = 0
    tcp_sack_info_t conn_proto_priv.cp_tcp->tcp_sack_info = {
        int32_t tcp_pipe = 0
        tcp_seq tcp_fack = 0
        tcp_seq tcp_sack_snxt = 0
        int32_t tcp_max_sack_blk = 0x3
        int32_t tcp_num_sack_blk = 0
        sack_blk_t [5] tcp_sack_list = [
            sack_blk_t {
                tcp_seq begin = 0
                tcp_seq end = 0
            },
            sack_blk_t {
                tcp_seq begin = 0
                tcp_seq end = 0
            },
            sack_blk_t {
                tcp_seq begin = 0
                tcp_seq end = 0
            },
            sack_blk_t {
                tcp_seq begin = 0
                tcp_seq end = 0
            },
            sack_blk_t {
                tcp_seq begin = 0
                tcp_seq end = 0
            },
        ]
        int32_t tcp_num_notsack_blk = 0
        uint32_t tcp_cnt_notsack_list = 0
        notsack_blk_t *tcp_notsack_list = 0
    }
    mblk_t *conn_proto_priv.cp_tcp->tcp_rcv_list = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_rcv_last_head = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_rcv_last_tail = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_rcv_cnt = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_cwnd_ssthresh = 0xb50
    uint32_t conn_proto_priv.cp_tcp->tcp_cwnd_max = 0x1004c8
    uint32_t conn_proto_priv.cp_tcp->tcp_csuna = 0xf1ba6d31
    clock_t conn_proto_priv.cp_tcp->tcp_rtt_sa = 0x121e
    clock_t conn_proto_priv.cp_tcp->tcp_rtt_sd = 0x3f5
    clock_t conn_proto_priv.cp_tcp->tcp_rtt_update = 0x1
    clock_t conn_proto_priv.cp_tcp->tcp_ms_we_have_waited = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_swl1 = 0xc4418b9e
    uint32_t conn_proto_priv.cp_tcp->tcp_swl2 = 0xf1ba6d31
    uint32_t conn_proto_priv.cp_tcp->tcp_rack = 0xc4418b76
    uint32_t conn_proto_priv.cp_tcp->tcp_rack_cnt = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_rack_cur_max = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_rack_abs_max = 0x2
    timeout_id_t conn_proto_priv.cp_tcp->tcp_ack_tid = 0
    timeout_id_t conn_proto_priv.cp_tcp->tcp_push_tid = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_max_swnd = 0x1004c0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_listener = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_irs = 0xc4418b75
    uint32_t conn_proto_priv.cp_tcp->tcp_fss = 0xf1ba6cff
    uint32_t conn_proto_priv.cp_tcp->tcp_urg = 0
    clock_t conn_proto_priv.cp_tcp->tcp_first_timer_threshold = 0x2710
    clock_t conn_proto_priv.cp_tcp->tcp_second_timer_threshold = 0x493e0
    clock_t conn_proto_priv.cp_tcp->tcp_first_ctimer_threshold = 0x2710
    clock_t conn_proto_priv.cp_tcp->tcp_second_ctimer_threshold = 0x2bf20
    uint32_t conn_proto_priv.cp_tcp->tcp_urp_last = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_urp_mp = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_urp_mark_mp = 0
    int conn_proto_priv.cp_tcp->tcp_conn_req_cnt_q0 = 0
    int conn_proto_priv.cp_tcp->tcp_conn_req_cnt_q = 0
    int conn_proto_priv.cp_tcp->tcp_conn_req_max = 0
    t_scalar_t conn_proto_priv.cp_tcp->tcp_conn_req_seqnum = 0
    kmutex_t conn_proto_priv.cp_tcp->tcp_eager_lock = {
        void *[1] _opaque = [ 0 ]
    }
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_next_q = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_last_q = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_next_q0 = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_prev_q0 = 0
    boolean_t conn_proto_priv.cp_tcp->tcp_conn_def_q0 = 0 (0)
    union  conn_proto_priv.cp_tcp->tcp_conn = {
        mblk_t *tcp_eager_conn_ind = 0
        mblk_t *tcp_opts_conn_req = 0
    }
    uint32_t conn_proto_priv.cp_tcp->tcp_syn_rcvd_timeout = 0
    int32_t conn_proto_priv.cp_tcp->tcp_ka_last_intrvl = 0
    timeout_id_t conn_proto_priv.cp_tcp->tcp_ka_tid = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_ka_interval = 0x6ddd00
    uint32_t conn_proto_priv.cp_tcp->tcp_ka_rinterval = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_ka_abort_thres = 0x75300
    uint32_t conn_proto_priv.cp_tcp->tcp_ka_cnt = 0
    int32_t conn_proto_priv.cp_tcp->tcp_client_errno = 0
    ipha_t *conn_proto_priv.cp_tcp->tcp_ipha = 0xffffe3a84d7ac830
    ip6_t *conn_proto_priv.cp_tcp->tcp_ip6h = 0
    tcpha_t *conn_proto_priv.cp_tcp->tcp_tcpha = 0xffffe3a84d7ac844
    uint16_t conn_proto_priv.cp_tcp->tcp_last_sent_len = 0x30
    uint16_t conn_proto_priv.cp_tcp->tcp_last_recv_len = 0
    uint16_t conn_proto_priv.cp_tcp->tcp_dupack_cnt = 0
    kmutex_t *conn_proto_priv.cp_tcp->tcp_acceptor_lockp = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_ordrel_mp = 0
    t_uscalar_t conn_proto_priv.cp_tcp->tcp_acceptor_id = 0
    int conn_proto_priv.cp_tcp->tcp_ipsec_overhead = 0
    uint_t conn_proto_priv.cp_tcp->tcp_recvifindex = 0
    uint_t conn_proto_priv.cp_tcp->tcp_recvhops = 0
    uint_t conn_proto_priv.cp_tcp->tcp_recvtclass = 0
    ip6_hbh_t *conn_proto_priv.cp_tcp->tcp_hopopts = 0
    ip6_dest_t *conn_proto_priv.cp_tcp->tcp_dstopts = 0
    ip6_dest_t *conn_proto_priv.cp_tcp->tcp_rthdrdstopts = 0
    ip6_rthdr_t *conn_proto_priv.cp_tcp->tcp_rthdr = 0
    uint_t conn_proto_priv.cp_tcp->tcp_hopoptslen = 0
    uint_t conn_proto_priv.cp_tcp->tcp_dstoptslen = 0
    uint_t conn_proto_priv.cp_tcp->tcp_rthdrdstoptslen = 0
    uint_t conn_proto_priv.cp_tcp->tcp_rthdrlen = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_timercache = 0xffffd0909077d148
    kmutex_t conn_proto_priv.cp_tcp->tcp_closelock = {
        void *[1] _opaque = [ 0 ]
    }
    kcondvar_t conn_proto_priv.cp_tcp->tcp_closecv = {
        ushort_t _opaque = 0
    }
    uint8_t conn_proto_priv.cp_tcp->tcp_closed = 0
    uint8_t conn_proto_priv.cp_tcp->tcp_closeflags = 0
    mblk_t conn_proto_priv.cp_tcp->tcp_closemp = {
        struct msgb *b_next = 0
        struct msgb *b_prev = 0
        struct msgb *b_cont = 0
        unsigned char *b_rptr = 0
        unsigned char *b_wptr = 0
        struct datab *b_datap = 0
        unsigned char b_band = 0
        unsigned char b_tag = 0
        unsigned short b_flag = 0
        queue_t *b_queue = 0
    }
    timeout_id_t conn_proto_priv.cp_tcp->tcp_linger_tid = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_acceptor_hash = 0
    struct tcp_s **conn_proto_priv.cp_tcp->tcp_ptpahn = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_bind_hash = 0xffffd0bf08f733c0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_bind_hash_port = 0
    struct tcp_s **conn_proto_priv.cp_tcp->tcp_ptpbhn = 0xffffd09087e9fa00
    struct tcp_rg_s *conn_proto_priv.cp_tcp->tcp_rg_bind = 0
    uint_t conn_proto_priv.cp_tcp->tcp_maxpsz_multiplier = 0xa
    uint32_t conn_proto_priv.cp_tcp->tcp_lso_max = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_ofo_fin_seq = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_cwr_snd_max = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_saved_listener = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_in_ack_unsent = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_loopback_peer = 0
    mblk_t *conn_proto_priv.cp_tcp->tcp_fused_sigurg_mp = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_fused :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_unfusable :1 = 0
    unsigned int:1 conn_proto_priv.cp_tcp->tcp_fused_sigurg :1 = 0
    unsigned int:29 conn_proto_priv.cp_tcp->tcp_fuse_to_bit_31 :29 = 0
    kmutex_t conn_proto_priv.cp_tcp->tcp_non_sq_lock = {
        void *[1] _opaque = [ 0 ]
    }
    boolean_t conn_proto_priv.cp_tcp->tcp_issocket = 0x1 (B_TRUE)
    uint32_t conn_proto_priv.cp_tcp->tcp_squeue_bytes = 0
    boolean_t conn_proto_priv.cp_tcp->tcp_closemp_used = 0 (0)
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_prev_drop_q0 = 0
    struct tcp_s *conn_proto_priv.cp_tcp->tcp_eager_next_drop_q0 = 0
    boolean_t conn_proto_priv.cp_tcp->tcp_flow_stopped = 0 (0)
    uint32_t conn_proto_priv.cp_tcp->tcp_snxt_shrunk = 0
    sock_connid_t conn_proto_priv.cp_tcp->tcp_connid = 0x1
    mblk_t *conn_proto_priv.cp_tcp->tcp_rsrv_mp = 0xffffd0a4b5f6f7a0
    kmutex_t conn_proto_priv.cp_tcp->tcp_rsrv_mp_lock = {
        void *[1] _opaque = [ 0 ]
    }
    struct tcp_listen_cnt_s *conn_proto_priv.cp_tcp->tcp_listen_cnt = 0
    timeout_id_t conn_proto_priv.cp_tcp->tcp_reass_tid = 0
    uint32_t conn_proto_priv.cp_tcp->tcp_fin_wait_2_flush_interval = 0xea60
}

The key fields are conn_open_time = 0x117fa21e5 and tcp_last_recv_time = 0x117fa2a51. We can compute how long ago this was by subtracting them from the lbolt_internal we printed above, also minus lbolt_debug_time (because that's how the kernel appears to compute lbolt):

> 0x192a17be2-0x775eb74d-0x117fa2a51=E
                55089732       

and we can subtract this from the timestamp we recorded above for lbolt 0x192a17be2-0x775eb74d.

> new Date(new Date('February  8, 2017 07:51:28 PM UTC').getTime() - 55089732).toISOString()
'2017-02-08T04:33:18.268Z'

From this, we have that the connection was opened close to 2017-02-08T04:33:16.112Z and last received a packet close to 2017-02-08T04:33:18.268Z. That's reasonably consistent with the ZooKeeper logs above, which showed that ZK accepted the connection at 2017-02-08 04:33:17.977, and it timed it out after 40 seconds of inactivity at 2017-02-08T04:34:00.002Z.

It looks from this like the moray process established a connection around 04:33:16Z, but then didn't do anything with it. 40 seconds later, ZK timed it out and closed it, but the Moray side never found out, which suggests a partition. But the fact that the Moray side didn't separately time out the connection suggests that it had not send any unacknowledged data over it, which basically always seems like a bug for a ZooKeeper client (which is supposed to heartbeat every few seconds).

We have a core file from this moray process. Hopefully that will shed some light on why this process isn't trying to send anything over that connection.


Comment by Former user
Created at 2017-02-08T21:51:52.000Z
Updated at 2017-02-08T21:56:31.000Z

Also of note: tcp_ibsegs and tcp_obsegs are both 0, but tcp_last_sent_len is 48, and we're not sure what these mean.


Comment by Former user
Created at 2017-02-08T22:18:15.000Z

In the core dump, we can see that the current zk client ConnectionManager is stuck waiting for the ConnectResponse frame:

> ::findjsobjects -c ConnectionManager | ::findjsobjects | ::jsprint state socket.destroyed socket._connecting socket._bytesDispatched socket.bytesRead connectTimeoutHandler.ontimeout
1 false false 48 0 null

This is a ConnectionManager that's in STATE_CONNECTING (1), whose socket is neither destroyed nor connecting, and whose connect timeout has been cleared. It has sent 48 bytes (the size of a ConnectRequest) and received none, so it's definitely in the indefinite stuck state where it will send no more traffic until it receives a ConnectResponse from ZK.

FWIW I've added a zk-stuck analyzer to thoth to identify gcores of processes that are in this state.


Comment by Former user
Created at 2017-02-08T23:25:48.000Z

Alex took a look at the core file and confirmed that this appears to be the same root cause as MANATEE-288: the node zookeeper client is stuck in the state transition where it has a connected TCP socket but has not received a response from ZooKeeper to its L7 connect request.

I checked whether the shard 1 Moray processes in east1b and east1c were affected by this. I think I logged in and ran moraystat.d and compared it to the output above. All four processes had working connections to PostgreSQL. As a result, I was able to mitigate the problem on shard 1 by taking the east1a zone out of service by running "svcadm disable -s registrar" there.

After doing that, we found we were still seeing request failures at muskie, and they were reporting ECONNREFUSED errors from node-fast. That's consistent with the same issue, and there's no reason to expect that it would be limited to shard 1. These appeared to be spread across all shards. I went to go see which Moray processes were affected using this command:

[root@headnode (us-east-1c) ~]# manta-oneach -s moray 'grep -c ECONNREFUSED /var/log/moray.log'
SERVICE          ZONE     OUTPUT
moray            02cbb82e 0
moray            187de16b 0
moray            1ea718ae 0
moray            23a267ae 0
moray            246790af 0
moray            2e130d3a 0
moray            2fc31cf5 0
moray            36dfa71a 0
moray            3b603936 0
moray            3e71cad1 0
moray            40a6317c 0
moray            46a37d98 0
moray            4783cdfe 0
moray            54d6acea 0
moray            6769cb7d 0
moray            8757af0d 0
moray            9bedb32a 0
moray            9cf77575 0
moray            a249f546 0
moray            a25c4080 0
moray            a8855e3f 0
moray            b70034da 0
moray            c2ff63c0 0
moray            cfa28605 0
moray            f4bcc227 0

east1b looked just like east1c -- no errors in any process -- but east1a showed one moray process in each of four zones:

[root@headnode (us-east-1a) ~]# manta-oneach -s moray 'grep -c ECONNREFUSED /var/log/moray.log'
SERVICE          ZONE     OUTPUT
moray            0089fcd7 0
moray            08b021cf 0
moray            0a351171 867
moray            17b37a11 0
moray            1d11226b 87
moray            2bd7a375 0
moray            2d309372 0
moray            2dd7cb5b 228
moray            2eb18f5c 0
moray            4228c63b 0
moray            4e6e76f0 0
moray            59fd11a8 167
moray            9187bb05 0
moray            97d2dea1 0
moray            98d70f0e 0
moray            ae75f0f5 0
moray            b5b915ab 0
moray            c62ea126 0
moray            cf21edd3 0
moray            d373891f 0
moray            d63cfaae 0
moray            d7fd4e9d 0
moray            dd07da56 0
moray            e143ca38 0
moray            f2b69942 0

Specifically, it's these pids:

[root@headnode (us-east-1a) ~]# manta-oneach -s moray 'grep ECONNREFUSED /var/log/moray.log | json -ga pid msg | sort | uniq -c'
SERVICE          ZONE     OUTPUT
moray            0089fcd7 
moray            08b021cf 
moray            0a351171    2000 4832 unable to create PG client
moray            17b37a11 
moray            1d11226b    4171 16018 unable to create PG client
moray            2bd7a375 
moray            2d309372 
moray            2dd7cb5b    4811 22252 unable to create PG client
moray            2eb18f5c 
moray            4228c63b 
moray            4e6e76f0 
moray            59fd11a8    3081 54191 unable to create PG client
moray            9187bb05 
moray            97d2dea1 
moray            98d70f0e 
moray            ae75f0f5 
moray            b5b915ab 
moray            c62ea126 
moray            cf21edd3 
moray            d373891f 
moray            d63cfaae 
moray            d7fd4e9d 
moray            dd07da56 
moray            e143ca38 
moray            f2b69942 

I modified the invocation to give me the full zonename:

[root@headnode (us-east-1a) ~]# manta-oneach -s moray 'grep -c ECONNREFUSED /var/log/moray.log > /dev/null && zonename'
SERVICE          ZONE     OUTPUT
moray            0089fcd7 
moray            08b021cf 
moray            0a351171 0a351171-0080-443e-9e0e-a26f99f56dac
moray            17b37a11 
moray            1d11226b 1d11226b-bae1-4667-a3cb-20775a981f01
moray            2bd7a375 
moray            2d309372 
moray            2dd7cb5b 2dd7cb5b-3a38-443a-b04b-826645263e41
moray            2eb18f5c 
moray            4228c63b 
moray            4e6e76f0 
moray            59fd11a8 59fd11a8-3556-4990-92b2-e293deaadd74
moray            9187bb05 
moray            97d2dea1 
moray            98d70f0e 
moray            ae75f0f5 
moray            b5b915ab 
moray            c62ea126 
moray            cf21edd3 
moray            d373891f 
moray            d63cfaae 
moray            d7fd4e9d 
moray            dd07da56 
moray            e143ca38 
moray            f2b69942 

Then I disabled registrar on each of these to take them out of service:

[root@headnode (us-east-1a) ~]# manta-oneach -z 0a351171-0080-443e-9e0e-a26f99f56dac,1d11226b-bae1-4667-a3cb-20775a981f01,2dd7cb5b-3a38-443a-b04b-826645263e41,59fd11a8-3556-4990-92b2-e293deaadd74 'svcadm disable registrar'
SERVICE          ZONE     OUTPUT
moray            0a351171 
moray            1d11226b 
moray            2dd7cb5b 
moray            59fd11a8 

Shortly thereafter, the error rate reported by mlive went to about zero.

For reference, these were shards 13, 20, 22, and 16, respectively. One question is: why did requests for many shards seem affected, if only these four were affected? PUT requests require fetching from two shards: that of the parent directory, and that of the object itself. The log only includes one of these shards.

There are a number of additional tickets to file related to this issue, msotly to improve debugging.


Comment by Former user
Created at 2017-02-09T01:59:51.000Z

I've restarted the affected moray processes and re-enabled registrar.


Comment by Former user
Created at 2017-02-14T01:00:38.000Z
Updated at 2017-02-14T01:02:59.000Z

So I've come up with a horrifying way to reproduce this in the lab easily:

First, set this up on a nameservice instance (it will stop the entire zookeeper for 1 sec every time a new client connects!)

svcs -p zookeeper | awk '$3 == "java" { print $2; exit(0); }' | xargs \
  dtrace -w \
  -n 'pid$1::accept:return { newsock[arg1] = 1; }' \
  -n 'pid$1::write:entry /newsock[arg0] == 1 && (arg2 == 40 || arg2 == 41)/ { newsock[arg0] = 0; stop(); system("sleep 1; prun %d", pid); }' \
  -n 'pid$1::close:entry /newsock[arg0] == 1/ { newsock[arg0] = 0; }'

Then do this to a moray:

ipdadm remove drop && svcadm disable moray-2021 && sleep 5 && \
  svcadm enable moray-2021 && sleep 0.9 && svcs -p moray-2021 | awk '$3 == "node" { print $2 }' | xargs pfiles | grep 2181 && \
  ipdadm drop 100 && sleep 300 && ipdadm remove drop

(make sure the grep shows the IP of the nameservice you started dtrace on. if it doesn't, ctrl+c this and try again until you get the right one)

Now after ~5 min, with a little luck you'll have a fully reconnected (and stuck) zk client in your moray-2021 process:

> ::findjsobjects -c ConnectionManager | ::findjsobjects | ::jsprint state socket.destroyed socket._connecting socket._bytesDispatched socket.bytesRead connectTimeoutHandler.ontimeout
undefined undefined undefined undefined undefined undefined
1 false false 48 0 null
33564671 undefined undefined undefined undefined undefined

Now that it's stuck, all you have to do is trigger a failover in one of your manatee clusters (exercise left to the reader) and ta-da!


Comment by Former user
Created at 2017-02-21T19:19:40.000Z

For the fix for this, (part 1) https://cr.joyent.us/#/c/1529/

Testing:


Comment by Former user
Created at 2017-02-21T19:27:18.000Z

node-manatee commit c2ad9ae (branch master, by Alex Wilson)

MORAY-398 moray caught connecting to deposed peer
Reviewed by: Cody Peter Mello <cody.mello@joyent.com>
Approved by: David Pacheco <dap@joyent.com>


Comment by Former user
Created at 2017-02-22T00:30:34.000Z

moray commit 257556c (branch master, by Alex Wilson)

MORAY-398 moray caught connecting to deposed peer
Reviewed by: David Pacheco <dap@joyent.com>
Reviewed by: Cody Peter Mello <cody.mello@joyent.com>
Approved by: David Pacheco <dap@joyent.com>


Comment by Former user
Created at 2017-02-27T18:41:01.000Z

@accountid:62561a9f9770e60071696c8f, can we mark this fixed now?