OS-7586: ipf sometimes freezes RFC 1323 transfers

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2017-02-23T01:42:56.000Z
Updated at:2019-02-20T21:04:46.722Z

People

Created by:Todd Whiteman
Reported by:Todd Whiteman
Assigned to:Dan McDonald

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-02-20T21:04:46.708Z)

Fix Versions

2019-02-28 Mind Grapes (Release Date: 2019-02-28)

Description

Docker pull has been having reliability problems (see DOCKER-893) when downloading images (only in Triton public cloud - but not in COAL).

In nightly-1, we were easily able to reproduce the docker pull failure in the imgapi zone, but we could not reproduce the problem in cloudapi zone. After checking differences, I noticed that cloudapi zone does not enable the firewall, but the imgapi zone does.

After disabling firewall in the imgapi zone:

vmadm update $(vmadm list -o uuid alias=imgapi0) firewall_enabled=false

the docker pull problem goes away.

After enabling firewall in the cloudapi zone:

vmadm update $(vmadm list -o uuid alias=cloudapi0) firewall_enabled=true

the docker pull problem reappears.

So something in the firewaller is causing these docker pull requests to stall/hang.

Note: Firewall has been enabled in imgapi since 2013, see HEAD-1744.

Comments

Comment by Todd Whiteman
Created at 2017-02-23T17:16:20.000Z

@accountid:624ba422247a4b0069202447 was asking for more details on how IMGAPI gets stuck, I'll try to summarise DOCKER-893 below:

Here is an example URL of what IMGAPI is using (note that this is a short lived URL, so this will likely be a 404 by the time someone reads this), this layer is a 64MB docker gzipped tarball for the centos:6 image:

https://dseasb33srnrn.cloudfront.net/registry-v2/docker/registry/v2/blobs/sha256/32/32c4f4fef1c65e47e41704aabeb99e984f1c01e3541e48354b09300fa5b2d068/data?Expires=1487870734&Signature=jD4R9aVo0~jjdDCl-LhlP2~q5~QxRxau-TXuKvJmtwKjsg0r2uoXliA5VEYcMaO~4BIKe0tJcTHQbW5zA3j18o-2s7SNx7lONwGcFQKR82JE4iovqucF-m8TJqYCyaT8eXlZOlPi7fi4mC~pnMfvDt8~3GgmDkc02zc34ayDXHc_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q


Comment by Todd Whiteman
Created at 2017-02-23T20:02:26.000Z

Repro steps:

$ git clone https://github.com/joyent/node-docker-registry-client.git
$ cd node-docker-registry-client/
$ npm install

# At this stage you can rsync this repo (since it's all straight JS code) into a test zone, e.g. cloudapi or imgapi

$ rsync -avz ../node-docker-registry-client "nightly-1:/zones/${ZONE_UUID}/root/root"

# Login into zone and run:

$ node node-docker-registry-client/examples/v2/downloadImg.js node:latest
Repo: docker.io/node
Wrote manifest: node-latest.manifest
Downloaded layer 10 of 11: node-latest-10-a3ed95caeb02.layer
Downloaded layer 6 of 11: node-latest-6-33d67d70af20.layer
Downloaded layer 3 of 11: node-latest-3-a3ed95caeb02.layer
Downloaded layer 4 of 11: node-latest-4-a3ed95caeb02.layer
Downloaded layer 1 of 11: node-latest-1-a3ed95caeb02.layer
Downloaded layer 5 of 11: node-latest-5-da053401c2b1.layer
Downloaded layer 9 of 11: node-latest-9-fce5728aad85.layer
Downloaded layer 2 of 11: node-latest-2-05b24114aa8d.layer
Downloaded layer 11 of 11: node-latest-11-5040bd298390.layer
Downloaded layer 8 of 11: node-latest-8-76610ec20bf5.layer
node 11 layers                             [===========================================================================================>] 100% 247.18MB  22.34MB/s    11s
Downloaded layer 7 of 11: node-latest-7-9c1bc3c30371.layer
Digest: sha256:3f647bd166629c4d20491f03018b39cd12c90fdfe42086b2147a3ccddca5ffea

Running the above node command will stall (possibly after a few successful attempts) and the progress bar will be stuck, like below:

node 11 layers       [========================>  ]  96% 237.74MB  71.39MB/s     0s

# and then it will eventually timeout and produce an error like this:
downloadImg: error: first of 1 error: unexpected downloaded size: expected 51361210 bytes, downloaded 41466968 bytes

Comment by Todd Whiteman
Created at 2018-08-10T19:31:19.674Z

From @accountid:70121:6490ccfd-5932-4e7a-936d-554bdd3dc0d3

So if you can get that to happen, some /native command output might be useful:
/native/usr/bin/pfiles <pid-of-wedged-curl>
/native/usr/sbin/ipfstat -io

That FW removal cures this problem suggests something's screwy with ipf.

Seeing the 4-tuple of any TCP connections (pfiles) vs. what the state of the FW rules for that zone are might be helpful. (Assuming of course, as you point out, it's not on the IMGAPI zone...)

Here is the output from these commands (reproduced in nightly-1 today):

[root@0734c91d-0415-46f1-a041-12e0b6bb3b72 (nightly-1:imgapi0) ~]# pfiles 72756
72756:  curl -L -H Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsI
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:566,25 ino:1822165736 uid:0 gid:7 rdev:6,2
      O_RDWR
      /dev/pts/2
      offset:73530
   1: S_IFREG mode:0644 dev:90,65561 ino:98497 uid:0 gid:0 size:21953024
      O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
      /var/tmp/t/blob
      offset:21953024
   2: S_IFCHR mode:0620 dev:566,25 ino:1822165736 uid:0 gid:7 rdev:6,2
      O_RDWR
      /dev/pts/2
      offset:73530
   3: S_IFDOOR mode:0444 dev:576,0 ino:141 uid:0 gid:0 rdev:575,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[17153]
   4: S_IFSOCK mode:0666 dev:574,0 ino:4786 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(2097152),SO_RCVBUF(2098152)
        sockname: AF_INET 172.26.1.10  port: 34790
        peername: AF_INET 54.175.43.85  port: 443
   5: S_IFSOCK mode:0666 dev:574,0 ino:21751 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(2097152),SO_RCVBUF(2098020)
        sockname: AF_INET 172.26.1.10  port: 48748
        peername: AF_INET 104.18.123.25  port: 443

[root@0734c91d-0415-46f1-a041-12e0b6bb3b72 (nightly-1:imgapi0) ~]# ipfstat -io
empty list for ipfilter(out)
empty list for ipfilter(in)

Comment by Dan McDonald
Created at 2018-08-10T19:35:00.842Z

Hmmm, so nothing obviously local (unless the zone's ipfilter is being controlled by the global... maybe ipfstat -io -z <zonename> from the global zone?


Comment by Todd Whiteman
Created at 2018-08-10T19:48:55.229Z
Updated at 2018-08-10T19:52:22.403Z

To reproduce this, login into the nightly-1 imgapi zone:

ssh nightly-1
sdc-login -l imgapi

Copy/paste this bash function into your imgapi shell:
https://gist.githubusercontent.com/twhiteman/a06e78fc43b830c6ebab29ab68ff60a5/raw/903b41be356790fea1938e4cb223eb1163f0b17b/getDockerBlob.sh

then run:

getBlob "softwareplant/oracle@sha256:faadd00cf98e787e0377cc4bcf8c552437107c15f485b919961739fe42c65f59" -v > /var/tmp/blob

This is 2.5GB image layer - so it will likely stall somewhere in that download, for example, currently in nightly-1 this command has only downloaded 400M after 20 mins. If it does not stall - retry.


Comment by Dan McDonald
Created at 2018-08-10T21:01:15.886Z

The curl process the above script generates has two TCP (HTTPS) connections open, one to an IP that corresponds with a hostname in the URL, and a second that appears to be cloudflare (likely redirected to a CDN).  Traffic bursts, then stops moving.  I can't tell where FWAPI's enforcement occurs.  Once I know that I can debug the problem more.


Comment by Dan McDonald
Created at 2018-08-10T21:47:37.337Z

Captured the FW rules according to ipf:

[root@headnode (nightly-1) ~]# ipfstat -G 0734c91d-0415-46f1-a041-12e0b6bb3b72 -io
pass out quick proto tcp from any to any flags S/SA keep state
pass out proto tcp from any to any
pass out proto udp from any to any keep state
pass out quick proto icmp from any to any keep state
pass out proto icmp from any to any
pass in quick proto icmp from any to any keep frags
pass in quick proto tcp from 172.25.1.0/24 to any keep frags
pass in quick proto udp from 172.25.1.0/24 to any keep frags
block in all
[root@headnode (nightly-1) ~]# 

And captured HTTPS traffic when the curl hang (duplicate ACKs from us + still-arriving packets from peer == inbound droppage most likely) here on Manta:  /danmcd/stor/FWAPI-275.snoop


Comment by Jira Bot
Created at 2019-02-07T04:06:53.532Z

globe-theatre commit 6870b74d1a0a03c0241e03e1789e1d8622784f1f (branch master, by Todd Whiteman)

TRITON-1198 disable imgapi firewall for test suites that use docker pull


Comment by Dan McDonald
Created at 2019-02-14T21:01:31.140Z

I'm seeing this as well with more generic transfers (scp in my case), but not easily.  Something is triggering packet drops, or at least one packet drop, and it's observable by seeing fr_tcpinwindow() (in ipf) returning 0 when processing a packet with data (i.e. not just an ACK).  There's a hardcoded value in this function, but it's not clear that's the reason the transfer fails.  I.e. it could be correlation, not causation.  In a perfect world this will lead to an illumos bug, or at least an OS- ticket.


Comment by Dan McDonald
Created at 2019-02-15T19:54:56.167Z

Examining more recent ipfilter updates (such as the one proposed by an illumos community member to be ported into illumos), it appears the TCP state-maintenance checks have some flaws.  There's a commit in FreeBSD:

https://github.com/freebsd/freebsd/commit/c636cf4e7104540ba66c035229138c1f5f52c374

That addressed a sabotage of RFC 1323 window-scaling.  PARTS of this fix appeared in illumos already thanks to:

https://github.com/illumos/illumos-gate/commit/f17d2b412cd592f6085d3a1002cac662768ec1ae

but not ALL of them.  This bug will address the full integration of the FreeBSD fix above to match what was already in illumos back from the early days of OpenSolaris.


Comment by Dan McDonald
Created at 2019-02-15T21:08:48.986Z

With the proposed fix in place, packets for large transfers are much smoother (observed by snoop -o /dev/null and watching the counter not stall every n-1000 packets), and the attached DTrace script produces NO OUTPUT.


Comment by Todd Whiteman
Created at 2019-02-15T21:43:21.776Z

Testing in nightly-1 using a platform with Dan's fixes solved the docker pull issues I was seeing - 25 successful docker pulls of the holzi/quine-relay (1.5GB) image.


Comment by Jira Bot
Created at 2019-02-19T18:27:26.404Z

illumos-joyent commit 20cd42d3954fb7e0280a3cf34a344db285355a0e (branch master, by Dan McDonald)

OS-7586 ipf sometimes freezes RFC 1323 transfers
Reviewed by: Jason King <jbk@joyent.com>
Reviewed by: Cody Peter Mello <melloc@writev.io>
Approved by: Cody Peter Mello <melloc@writev.io>