OS-7926: tests/pf_key/acquire-compare can fail to acquire $t1port

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2019-08-08T00:17:59.145Z
Updated at:2019-10-29T14:50:17.333Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Implemented: The feature has been implemented, is checked into the tree and tested
(Resolution Date: 2019-10-29T14:50:17.318Z)

Description

Often times I've hit cases where I see the pf_key/acquire-compare tests fail. I was able to capture information from the test log. Here's the log from a relevant run:

00:03:07.26 Warning, this trashes IPsec policy.
00:03:07.29 add net 10.21.12.0/24: gateway 10.21.12.5
00:03:07.29 add net 10.51.50.0/24: gateway 10.51.50.5
00:03:07.29 Waiting for pings...
00:03:17.31 Trying 10.19.84.3...
00:03:17.31 First local port ==
00:03:27.31 Trying 10.19.84.4...
00:03:27.32 Second local port == 40777
00:03:37.33 Trying 10.90.1.25...
00:03:37.34 Third local port == 48609
00:03:47.34 delete net 10.51.50.0/24: gateway 10.51.50.5
00:03:47.35 delete net 10.21.12.0/24: gateway 10.21.12.5
00:03:47.43 Checking for unique local port only in one ACQUIRE case.
00:03:47.43 egrep: RE error in |40777|48609: invalid regular expression
00:03:47.43 1,2d0
00:03:47.43 < SRC: AF_INET: port 40777, 172.20.0.52.
00:03:47.43 < SRC: AF_INET: port 40777, 172.20.0.52.
00:03:47.43 More than just the one unique port, , found in monitor output.

Note how we had an RE error in egrep. This corresponds to the code that's trying to grep for one of the t1port, t2port, and t3port. Unfortunately, we can see that for some reason we were unable to actually get t1port. This is seen in the line that doesn't have an actual port in the 'First local port == ' entry. At first glance it doesn't appear that there's anything that prevents the telnet from completing before the pfiles does.

Comments

Comment by Former user
Created at 2019-10-16T16:10:22.568Z

I've seen the above failure, but rarely. Much more common is a diff failure, that is due to the acquires interleaving in the log file. Although it's slower, doing the pings sequentially seems to help there.


Comment by Former user
Created at 2019-10-16T22:43:17.152Z

The specific problem above is not that telnet has exited, but that pfiles is too fast: it's only capturing stdin/out/err before telnet has a chance to operate. Luckily, it looks like we can do something simple and specify the port by hand, avoiding any need for pfiles at all.


Comment by Dan McDonald
Created at 2019-10-17T13:54:06.004Z

One can only specify the REMOTE port by hand in telnet. The local port is still selected by the kernel, and we want to confirm that the process's local port is accurately reflected by the ACQUIRE.  If an alternate available program (nc is in most distros, but not in -gate per se) can specify the 4-tuple on the CLI, that would eliminate the need for a pfiles check.  Otherwise, another bespoke C program will be needed.