OS-7620: Use -fstack-protector-strong when available

Details

Issue Type:Improvement
Priority:4 - Normal
Status:Resolved
Created at:2019-02-27T02:31:04.596Z
Updated at:2020-05-05T21:55:18.833Z

People

Created by:Robert Mustacchi [X]
Reported by:Robert Mustacchi [X]
Assigned to:Robert Mustacchi [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-08-19T18:40:03.077Z)

Fix Versions

2019-08-29 Zoo York (Release Date: 2019-08-29)

Related Links

Description

In illumos#5922, parts of the system were built with -fstack-protector. At that time, it was noted that the actual thing we wanted was -fstack-protector-strong; however, that wasn't supported in gcc 4.4.4. As we're looking at gcc7+ support, we should enable this option when we have a compiler that supports it.

Comments

Comment by Robert Mustacchi [X]
Created at 2019-07-26T15:35:13.857Z

While doing other testing, we found that there were issues with the qede module that basically cause it to break by no longer sending or receiving traffic and when combined with reptolines would lead to panics with firmware errors like:

> ::status
debugging crash dump vmcore.307 (64-bit) from haswell
operating system: 5.11 joyent_20190621T212044Z (i86pc)
git branch: master
git rev: a310e910d6ba982031d74ef0d3e1f346d0a47b3a
image uuid: (not set)
panic message: assertion failed: !1, file: ../../common/io/qede/579xx/drivers/ecore/ecore_int.c, line: 836
dump content: kernel pages only
> $C
fffffe000fb608e0 vpanic()
fffffe000fb60930 0xfffffffffbe34975()
fffffe000fb609b0 ecore_int_deassertion_aeu_bit+0x1c2(fffffe0bdf5c2090, ffffffffc0299f00, 84a8, fffffe000fb60a2a, 1)
fffffe000fb60a90 ecore_int_deassertion+0x343(fffffe0bdf5c2090, 1)
fffffe000fb60af0 ecore_int_attentions+0x8a(fffffe0bdf5c2090)
fffffe000fb60b40 ecore_int_sp_dpc+0x248(fffffe0bdf5c2090)
fffffe000fb60b70 qede_sp_handler+0x36(fffffe0bdf5c2090, fffffe0bdec11c68)
fffffe000fb60bc0 apix_dispatch_by_vector+0x8c(23)
fffffe000fb60c00 apix_dispatch_lowlevel+0x1c(23, 0)
fffffe000fae3a30 switch_sp_and_call+0x15()
fffffe000fae3a60 apix_setspl+0x22(fffffffffb8412b9)
fffffe000fae3aa0 do_splx+0xa9(fffffffffb86f25a)
fffffffffb88e8ed x86_rsb_stuff+0x11()
83fffffff9e890f3 [no mapping for address]

Comment by Robert Mustacchi [X]
Created at 2019-07-31T00:04:43.060Z
Updated at 2019-07-31T00:14:49.836Z

To test this, I put together a set of four different tests between on
haswell, a Intel(r) Xeon(r) CPU E3-1270 v3 @ 3.50GHz. I went into the
BIOS and made sure that TurboBoost was disabled so as to have a
consistent set of data.

Experimental Setup

I put together a set of four tests that were designed to fill in
different bits of information:

1. Calling stat in a tight loop on a file
2. Calling gethrvtime in a tight loop
3. Pinging another node over a back to back 25 GbE interface using the
qede driver
4. Testing some soft-real time processes by using a USB audio adapter to
play a converted mp3 to au file using the song 光 by Utada Hikaru.

To set these up, I first created a processor set, disabled interrupts on
that processor, and then bound my shell to it using the following
commands. Each test was performed a series of 10 times while bound to
the shell, with the exceptio of the audio test which was just done using
a separate session without being bound or with interrupts disabled. Set
up and running was as follows:

# psrset -c 1
# psradm -i 1
# psrset -b 1 $$
for i in $(seq 1 10); do ./gethrvtime 50000000 > vtime.out.$i; done
for i in $(seq 1 10); do ../stat ../file 5000000 > stat.out.$i; done
for i in $(seq 1 10); do ping -I 0.02 -sn 10.0.0.2 56 1000 > ping.out.$i; done

Note, for each test I ran it manually a few times to make sure that it
was good to go. The ping test was performed with haswell running on the
IP address 10.0.0.1/24 using qede1 with a QSFP28 cable to another system
using the same platform with the IP address 10.0.0.2/24.

The source code for the two programs is as follows with their Makefile.
These were built as 32-bit programs using gcc 4.7.3 and were built at
optimization -O0. For more details, see the Makefile:

# cat gethrvtime.c
/*
 * Perform a lot of hrtime calls in a row and see what happens.
 */

#include <err.h>
#include <stdlib.h>
#include <unistd.h>
#include <limits.h>
#include <sys/time.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int
main(int argc, const char *argv[])
{
	long long count, i;
	const char *errstr;
	hrtime_t start, end;

	if (argc != 2) {
		errx(EXIT_FAILURE, "missing args");
	}

	count = strtonum(argv[1], 0, LONG_MAX, &errstr);
	if (errstr != NULL) {
		errx(EXIT_FAILURE, "bad number of iterations: %s", errstr);
	}

	if ((count % 10000) != 0) {
		errx(EXIT_FAILURE, "count should be a multiple of 1000");
	}

	start = gethrtime();
	for (i = 0; i < count ; i++) {
		if (i != 0 && (i % 10000) == 0) {
			end = gethrtime();
			printf("%lld\n", end - start);
			start = gethrtime();
		}

		(void) gethrvtime();
	}

	end = gethrtime();
	printf("%lld\n", end - start);

	return (0);
}
# cat Makefile
#
#
#

CC = gcc
CFLAGS = -Wall -Wextra -O0
PROGS = stat gethrvtime

all: $(PROGS)

%: %.c
	$(CC) -o $@ $(CFLAGS) $<
# stat.c
/*
 * Perform a lot of stat calls in a row and see what happens.
 */

#include <err.h>
#include <stdlib.h>
#include <unistd.h>
#include <limits.h>
#include <sys/time.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int
main(int argc, const char *argv[])
{
	long long count, i;
	const char *errstr;
	hrtime_t start, end;

	if (argc != 3) {
		errx(EXIT_FAILURE, "missing args");
	}

	count = strtonum(argv[2], 0, LONG_MAX, &errstr);
	if (errstr != NULL) {
		errx(EXIT_FAILURE, "bad number of iterations: %s", errstr);
	}

	if ((count % 1000) != 0) {
		errx(EXIT_FAILURE, "count should be a multiple of 1000");
	}

	start = gethrtime();
	for (i = 0; i < count ; i++) {
		struct stat st;

		if (i != 0 && (i % 1000) == 0) {
			end = gethrtime();
			printf("%lld\n", end - start);
			start = gethrtime();
		}

		if (stat(argv[1], &st) != 0) {
			err(EXIT_FAILURE, "failed to stat %s", argv[1]);
		}
	}

	end = gethrtime();
	printf("%lld\n", end - start);

	return (0);
}

Experimental Results

I've tarred up all of the results and attached them to this ticket.
Doing a basic analysis found that most of the data was in the noise of
one another. While there were slight differences in these paths, it was
hard to attribute them. I also found no discernible difference in the
USB audio test.

To try and discern the differences, I first quantized all the data. It
all was bucketed to the same rough bucket, which meant that there
weren't any gross outliers. This was still true even when dividing the
time. I had trouble finding good software to feed into an arbitrary
equivalent of DTrace's lquantize and unfortunately my attempt to
brute force DTrace to do so didn't end well. Through some manual
inspection I mostly determined that it felt like it was all pretty
similar. While measurements such as the average aren't great or useful,
when the average of all the data is pretty tight and the before and
after are similar, it's hard to feel like this is creating a notable
difference.

Using the qede driver had an interesting side effect of determining
an issue in the driver. Though we've worked around it, which is partly
why some of the networking stack numbers weren't rerun a second time.


Comment by Jira Bot
Created at 2019-08-19T18:37:59.608Z

illumos-joyent commit edccf53a08a5dc2a1536d248367ab3aaf477ae60 (branch master, by Robert Mustacchi)

OS-7620 Use -fstack-protector-strong when available
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>
Approved by: John Levon <john.levon@joyent.com>