OS-6116: epoll should better detect fd reassignment

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2017-05-11T13:17:45.000Z
Updated at:2018-02-27T20:55:57.545Z

People

Created by:Casey Bisson [X]
Reported by:Casey Bisson [X]
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-07-24T15:23:39.000Z)

Fix Versions

2017-08-03 WARSAW PACT (Release Date: 2017-08-03)

Related Links

Description

Several customers, internal and otherwise, have reported seeing nginx periodically spew errors related to epoll_wait yielding EPERM. This is a little bit odd, as there isn't much in the way of access checking in the guts of epoll/devpoll itself. That logic does, however, allow errors from VOP_POLL handlers to trickle up. With the merging of OS-5941, attempts to epoll on regular files and directories will result in EPERM.

Since epoll_wait = EPERM is the expectation if a normal file end up in the set, why then is nginx stumbling over the problem. Certainly it wouldn't add a plain fd intentionally, only to complain about it later. Another possibility does exist in this case: An active fd was closed, without first being removed from the set, followed by it being quickly reassigned when nginx went to open a regular file. This touches on one of the wrinkles in the core design of epoll: it expects to act on the underlying struct file (our vnode_t would likely be the closes analog) rather than the fd itself. This places a premium in detecting when the resource backing of an fd changes.

Comments

Comment by Patrick Mooney [X]
Created at 2017-07-18T22:03:30.000Z

I believe the trouble detailed in illumos-joyent#139 is also related to this. In that case, it wasn't EPERM problems due to VREG resources sneaking into the set but rather unexpected successful events for freshly accept(2)ed sockets which occupy previously polled fd entries. The framework in question (https://github.com/oktal/pistache/) maintains a list of active sockets which it compares events against. When the accept(2) succeeds, placing a new resource into that fd, it's possible for the epoll_wait() to win the race against the framework socket registration code, emitting an event which won't match against the internal list of known sockets.


Comment by Bot Bot [X]
Created at 2017-07-21T01:12:54.000Z

illumos-joyent commit c9254ee (branch master, by Patrick Mooney)

OS-6116 epoll should better detect fd reassignment
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Jason King <jason.king@joyent.com>
Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>