Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2017-02-23T01:42:56.000Z |
Updated at: | 2019-02-20T21:04:46.722Z |
Created by: | Todd Whiteman |
---|---|
Reported by: | Todd Whiteman |
Assigned to: | Dan McDonald |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-02-20T21:04:46.708Z)
2019-02-28 Mind Grapes (Release Date: 2019-02-28)
Docker pull has been having reliability problems (see DOCKER-893) when downloading images (only in Triton public cloud - but not in COAL).
In nightly-1, we were easily able to reproduce the docker pull failure in the imgapi zone, but we could not reproduce the problem in cloudapi zone. After checking differences, I noticed that cloudapi zone does not enable the firewall, but the imgapi zone does.
After disabling firewall in the imgapi zone:
vmadm update $(vmadm list -o uuid alias=imgapi0) firewall_enabled=false
the docker pull problem goes away.
After enabling firewall in the cloudapi zone:
vmadm update $(vmadm list -o uuid alias=cloudapi0) firewall_enabled=true
the docker pull problem reappears.
So something in the firewaller is causing these docker pull requests to stall/hang.
Note: Firewall has been enabled in imgapi since 2013, see HEAD-1744.
@accountid:624ba422247a4b0069202447 was asking for more details on how IMGAPI gets stuck, I'll try to summarise DOCKER-893 below:
Here is an example URL of what IMGAPI is using (note that this is a short lived URL, so this will likely be a 404 by the time someone reads this), this layer is a 64MB docker gzipped tarball for the centos:6 image:
Repro steps:
$ git clone https://github.com/joyent/node-docker-registry-client.git $ cd node-docker-registry-client/ $ npm install # At this stage you can rsync this repo (since it's all straight JS code) into a test zone, e.g. cloudapi or imgapi $ rsync -avz ../node-docker-registry-client "nightly-1:/zones/${ZONE_UUID}/root/root" # Login into zone and run: $ node node-docker-registry-client/examples/v2/downloadImg.js node:latest Repo: docker.io/node Wrote manifest: node-latest.manifest Downloaded layer 10 of 11: node-latest-10-a3ed95caeb02.layer Downloaded layer 6 of 11: node-latest-6-33d67d70af20.layer Downloaded layer 3 of 11: node-latest-3-a3ed95caeb02.layer Downloaded layer 4 of 11: node-latest-4-a3ed95caeb02.layer Downloaded layer 1 of 11: node-latest-1-a3ed95caeb02.layer Downloaded layer 5 of 11: node-latest-5-da053401c2b1.layer Downloaded layer 9 of 11: node-latest-9-fce5728aad85.layer Downloaded layer 2 of 11: node-latest-2-05b24114aa8d.layer Downloaded layer 11 of 11: node-latest-11-5040bd298390.layer Downloaded layer 8 of 11: node-latest-8-76610ec20bf5.layer node 11 layers [===========================================================================================>] 100% 247.18MB 22.34MB/s 11s Downloaded layer 7 of 11: node-latest-7-9c1bc3c30371.layer Digest: sha256:3f647bd166629c4d20491f03018b39cd12c90fdfe42086b2147a3ccddca5ffea
Running the above node command will stall (possibly after a few successful attempts) and the progress bar will be stuck, like below:
node 11 layers [========================> ] 96% 237.74MB 71.39MB/s 0s # and then it will eventually timeout and produce an error like this: downloadImg: error: first of 1 error: unexpected downloaded size: expected 51361210 bytes, downloaded 41466968 bytes
From @accountid:70121:6490ccfd-5932-4e7a-936d-554bdd3dc0d3
So if you can get that to happen, some /native command output might be useful:
/native/usr/bin/pfiles <pid-of-wedged-curl>
/native/usr/sbin/ipfstat -ioThat FW removal cures this problem suggests something's screwy with ipf.
Seeing the 4-tuple of any TCP connections (pfiles) vs. what the state of the FW rules for that zone are might be helpful. (Assuming of course, as you point out, it's not on the IMGAPI zone...)
Here is the output from these commands (reproduced in nightly-1 today):
[root@0734c91d-0415-46f1-a041-12e0b6bb3b72 (nightly-1:imgapi0) ~]# pfiles 72756 72756: curl -L -H Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsI Current rlimit: 65536 file descriptors 0: S_IFCHR mode:0620 dev:566,25 ino:1822165736 uid:0 gid:7 rdev:6,2 O_RDWR /dev/pts/2 offset:73530 1: S_IFREG mode:0644 dev:90,65561 ino:98497 uid:0 gid:0 size:21953024 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /var/tmp/t/blob offset:21953024 2: S_IFCHR mode:0620 dev:566,25 ino:1822165736 uid:0 gid:7 rdev:6,2 O_RDWR /dev/pts/2 offset:73530 3: S_IFDOOR mode:0444 dev:576,0 ino:141 uid:0 gid:0 rdev:575,0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[17153] 4: S_IFSOCK mode:0666 dev:574,0 ino:4786 uid:0 gid:0 rdev:0,0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_KEEPALIVE,SO_SNDBUF(2097152),SO_RCVBUF(2098152) sockname: AF_INET 172.26.1.10 port: 34790 peername: AF_INET 54.175.43.85 port: 443 5: S_IFSOCK mode:0666 dev:574,0 ino:21751 uid:0 gid:0 rdev:0,0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_KEEPALIVE,SO_SNDBUF(2097152),SO_RCVBUF(2098020) sockname: AF_INET 172.26.1.10 port: 48748 peername: AF_INET 104.18.123.25 port: 443 [root@0734c91d-0415-46f1-a041-12e0b6bb3b72 (nightly-1:imgapi0) ~]# ipfstat -io empty list for ipfilter(out) empty list for ipfilter(in)
Hmmm, so nothing obviously local (unless the zone's ipfilter is being controlled by the global... maybe ipfstat -io -z <zonename>
from the global zone?
To reproduce this, login into the nightly-1 imgapi zone:
ssh nightly-1 sdc-login -l imgapi
Copy/paste this bash function into your imgapi shell:
https://gist.githubusercontent.com/twhiteman/a06e78fc43b830c6ebab29ab68ff60a5/raw/903b41be356790fea1938e4cb223eb1163f0b17b/getDockerBlob.sh
then run:
getBlob "softwareplant/oracle@sha256:faadd00cf98e787e0377cc4bcf8c552437107c15f485b919961739fe42c65f59" -v > /var/tmp/blob
This is 2.5GB image layer - so it will likely stall somewhere in that download, for example, currently in nightly-1 this command has only downloaded 400M after 20 mins. If it does not stall - retry.
The curl process the above script generates has two TCP (HTTPS) connections open, one to an IP that corresponds with a hostname in the URL, and a second that appears to be cloudflare (likely redirected to a CDN). Traffic bursts, then stops moving. I can't tell where FWAPI's enforcement occurs. Once I know that I can debug the problem more.
Captured the FW rules according to ipf:
[root@headnode (nightly-1) ~]# ipfstat -G 0734c91d-0415-46f1-a041-12e0b6bb3b72 -io pass out quick proto tcp from any to any flags S/SA keep state pass out proto tcp from any to any pass out proto udp from any to any keep state pass out quick proto icmp from any to any keep state pass out proto icmp from any to any pass in quick proto icmp from any to any keep frags pass in quick proto tcp from 172.25.1.0/24 to any keep frags pass in quick proto udp from 172.25.1.0/24 to any keep frags block in all [root@headnode (nightly-1) ~]#
And captured HTTPS traffic when the curl hang (duplicate ACKs from us + still-arriving packets from peer == inbound droppage most likely) here on Manta: /danmcd/stor/FWAPI-275.snoop
globe-theatre commit 6870b74d1a0a03c0241e03e1789e1d8622784f1f (branch master, by Todd Whiteman)
TRITON-1198 disable imgapi firewall for test suites that use docker pull
I'm seeing this as well with more generic transfers (scp in my case), but not easily. Something is triggering packet drops, or at least one packet drop, and it's observable by seeing fr_tcpinwindow() (in ipf) returning 0 when processing a packet with data (i.e. not just an ACK). There's a hardcoded value in this function, but it's not clear that's the reason the transfer fails. I.e. it could be correlation, not causation. In a perfect world this will lead to an illumos bug, or at least an OS- ticket.
Examining more recent ipfilter updates (such as the one proposed by an illumos community member to be ported into illumos), it appears the TCP state-maintenance checks have some flaws. There's a commit in FreeBSD:
https://github.com/freebsd/freebsd/commit/c636cf4e7104540ba66c035229138c1f5f52c374
That addressed a sabotage of RFC 1323 window-scaling. PARTS of this fix appeared in illumos already thanks to:
https://github.com/illumos/illumos-gate/commit/f17d2b412cd592f6085d3a1002cac662768ec1ae
but not ALL of them. This bug will address the full integration of the FreeBSD fix above to match what was already in illumos back from the early days of OpenSolaris.
With the proposed fix in place, packets for large transfers are much smoother (observed by snoop -o /dev/null and watching the counter not stall every n-1000 packets), and the attached DTrace script produces NO OUTPUT.
Testing in nightly-1 using a platform with Dan's fixes solved the docker pull issues I was seeing - 25 successful docker pulls of the holzi/quine-relay (1.5GB) image.
illumos-joyent commit 20cd42d3954fb7e0280a3cf34a344db285355a0e (branch master, by Dan McDonald)
OS-7586 ipf sometimes freezes RFC 1323 transfers
Reviewed by: Jason King <jbk@joyent.com>
Reviewed by: Cody Peter Mello <melloc@writev.io>
Approved by: Cody Peter Mello <melloc@writev.io>