HEAD-2283

registrar: health checker threshold/period management doesn't work

Status:
Open
Created:
2015-12-21T14:22:43.000-0500
Updated:
2017-11-07T16:38:53.478-0500

Description

The registrar health checker allows you to specify a "period" and "threshold". These appear to be based on the same-named parameters for amon probes:
https://github.com/joyent/sdc-amon/blob/master/docs/index.md

That is, a health check is considered to have failed only if "threshold" failures occur within the time period "period". However, the implementation looks completely wrong. When the checker starts its first check, it sets a timer for "period". When that timer fires, it sets another timer for "period", and that timer just clears the array of failures that we've detected so far. But that's it. So the end result of the entire mechanism is that the number of failures seen so far is cleared once, "2 * period" milliseconds after the first check starts. There are a bunch of things wrong with this:

I have not tested any of this – this is just my reading of the code.