INTRO(9E) Driver Entry Points INTRO(9E)
NAME
Intro - introduction to device driver entry points
DESCRIPTION
Section 9E of the manual describes the entry points and building blocks
that are used to build and implement all kinds of device drivers and kernel
modules. Often times, modules and device drivers are talked about
interchangeably. The operating system is built around the idea of loadable
kernel modules. Device drivers are the primary type that we think about;
however, there are loadable kernel modules for file systems, STREAMS
devices, and even system calls!
The vast majority of this section focuses on documenting device (and
STREAMS) drivers. Device driver are further broken down into different
categories depending on what they are targeting. For example, there are
dedicated frameworks for SCSI/SAS HBA drivers, networking drivers, USB
drivers, and then general character and block device drivers. While most
of the time we think about device drivers as corresponding to a piece of
physical hardware, there are also pseudo-device drivers which are device
drivers that provide functionality, but aren't backed by any hardware. For
example,
dtrace(4D) and
lofi(4D) are both pseudo-device drivers.
To help understand the relationship between these different types of
things, consider the following image:
+--------------------+
| |
| Loadable Modules |
| |
+--------------------+
| +--------------+ +------------+
| | | | |
+------------------------->| Cryptography | ... | Scheduling | ...
| | | | |
| +--------------+ +------------+
| +----------------+ +--------------+ +--------------+
| | | | | | |
+-->| Device Drivers | ... | File Systems | ... | System Calls | ...
| | | | | |
+----------------+ +--------------+ +--------------+
v
+-----------+
|
| +------------+ +---------+ +-----------+ +-----------+
+-->| Networking |->|
igb(4D) | ... |
mlxcx(4D) | ... |
cxgbe(4D) | ...
| +------------+ +---------+ +-----------+ +-----------+
|
| +-------+ +----------+ +-------------+ +----------+
+-->| HBA |------>|
smrt(4D) | ... |
mpt_sas(4D) | ... |
ahci(4D) | ...
| +-------+ +----------+ +-------------+ +----------+
|
| +-------+ +--------------+ +----------+ +---------+
+-->| USB |------>|
scsa2usb(4D) | ... |
ccid(4D) | ... |
hid(4D) | ...
| +-------+ +--------------+ +----------+ +---------+
|
| +---------+ +-------------+ +-------------+
+-->| Sensors |---->|
smntemp(4D) | ... |
pchtemp(4D) | ...
| +---------+ +-------------+ +-------------+
|
+-------+-------------+-----------+----------+
| v V |
v +-----------+ +-----+ v
+-------+ | Character | | USB | +-------+
| Audio | | and Block | | HCD | | Nexus | ...
+-------+ | Devices | +-----+ +-------+
+-----------+
The above diagram attempts to explain some of the relationships that were
mentioned above at a high level. All device drivers are loadable modules
that leverage the
modldrv(9S) structure and implement similar _
init(9E) and
_
fini(9E) entry points.
Some hardware implements more than one type of thing. The most common
example here would be a NIC that implements a temperature sensor or a
current sensor. Many devices also implement and leverage the kernel
statistics framework called "kstats". A device driver is not strictly
limited to only a single class of thing. For example, many USB client
devices are networking device drivers. In the subsequent sections we'll go
into the functions and structures that are related to creating the
different device drivers and their associated functions.
Kernel Initialization
To begin with, all loadable modules in the system are required to implement
three entry points. If these entry points are not present, then the module
cannot be installed in the system. These entry points are _
init(9E),
_
fini(9E), and _
info(9E).
The _
init(9E) entry point will be the first thing called in the module and
this is where any global initialization should be taken care of. Once all
global state has been successfully created, the driver should call
mod_install(9F) to actually register with the system. Conversely,
_
fini(9E) is used to tear down the module. The driver uses
mod_remove(9F) to first remove the driver from the system and then it can tear down any
global state that was added there.
While we mention global state here, this isn't widely used in most device
drivers. A device driver can have multiple instances instantiated, one for
each instance of a hardware device that is found and most state is tied to
those instances. We'll discuss that more in the next section.
The _
info(9E) entry point these days just calls
mod_info(9F) directly and
can return it.
All of these entry points directly or indirectly require a
struct modlinkage. This structure is used by all types of loadable kernel modules
and is filled in with information that varies based on the type of module
one is creating. Here, everything that we're creating is going to use a
struct modldrv, which describes a loadable driver. Every device driver
will declare a static global variable for these and fill them out. They
are documented in
modlinkage(9S) and
modldrv(9S) respectively.
The following is an example of these structures borrowed from
igc(4D):
static struct modldrv igc_modldrv = {
.drv_modops = &mod_driverops,
.drv_linkinfo = "Intel I226/226 Ethernet Controller",
.drv_dev_ops = &igc_dev_ops
};
static struct modlinkage igc_modlinkage = {
.ml_rev = MODREV_1,
.ml_linkage = { &igc_modldrv, NULL }
};
From this there are a few important things to take away. A single kernel
module may implement more than one type of linkage, though this is the
exception and not the norm. The second part to call out here is that while
the
drv_modops will be the same for all drivers that use the
struct modldrv, the
drv_linkinfo and
drv_dev_ops will be unique to each driver.
The next section discusses the
struct dev_ops.
The Devices Tree and Instances
Device drivers have a unique challenge that makes them different from other
kinds of loadable modules: there may be very well more than a single
instance of the hardware that they support. Consider a few examples: a
user can plug in two distinct USB mass storage devices or keyboards. A
system may have more than one NIC present or the hardware may expose
multiple physical ports as distinct devices. Many systems have more than
one disk device. Conversely, if a given piece of hardware isn't present
then there's no reason for the driver for it to be loaded. There is
nothing that the Intel 1 GbE Ethernet NIC driver,
igb(4D), can do if there
are no supported devices plugged in.
Devices are organized into a tree that is full of parent and child
relationships. This tree is what you see when you run
prtconf(8). As an
example, a USB device is plugged into a port on a hub, which may be plugged
into another hub, and then is eventually plugged into a PCI device that is
the USB host controller, which itself may be under a PCI-PCI bridge, and
this chain continues all the way up to the root of the tree, which we call
"rootnex". Device drivers that can enumerate children and provide
operations for them are called "nexus" drivers.
The system automatically fills out the device tree through a combination of
built-in mechanisms and through operations on other nexus drivers. When a
new hardware unit is discovered, a
dev_info_t structure, the device
information, is created for it and it is linked into the tree. Generally,
the system can then use automatic information embedded in the device to
determine what driver is responsible for the piece of hardware through the
use of the "compatible" property which the systems and nexus drivers set up
on their children. For example, PCI and PCIe drivers automatically set up
the compatible property based on information discovered in PCI
configuration space like the device's vendor, device ID, and class IDs.
The same is true of USB.
When a device driver is packaged, it contains metadata that indicates which
devices it supports. For example, the aforementioned igb driver will have
a rule that it matches "pciex8086,10a7". When the kernel discovers a
device with this alias present, it will know that it should assign it to
the igb driver and then it will assign the
dev_info_t structure a new
instance number.
To emphasize here, each time the device is discovered in the tree, it will
have an independent instance number and an independent
dev_info_t that
accompanies it. Each instance has an independent life time too. The most
obvious way to think about this is with something that can be physically
removed while the system is on, like a USB device. Just because you pull
one USB keyboard doesn't mean it impacts the other one there. They are
inherently different devices (albeit if they were plugged into the same HUB
and the HUB was removed, then they both would be removed; however, each
would be acted on independently).
Here is a slimmed down example from a system's
prtconf(8) output:
Oxide,Gimlet (driver name: rootnex)
scsi_vhci, instance #0 (driver name: scsi_vhci)
pci, instance #0 (driver name: npe)
pci1022,1480, instance #13 (driver name: amdzen_stub)
pci1022,164f
pci1022,1482
pci1de,fff9, instance #0 (driver name: pcieb)
pci1344,3100, instance #4 (driver name: nvme)
blkdev, instance #10 (driver name: blkdev)
pci1022,1482
pci1022,1482
pci1de,fff9, instance #1 (driver name: pcieb)
pci1b96,0, instance #7 (driver name: nvme)
blkdev, instance #0 (driver name: blkdev)
pci1de,fff9, instance #2 (driver name: pcieb)
pci1b96,0, instance #8 (driver name: nvme)
blkdev, instance #4 (driver name: blkdev)
pci1de,fff9, instance #3 (driver name: pcieb)
pci1b96,0, instance #10 (driver name: nvme)
blkdev, instance #1 (driver name: blkdev)
From this we can see that there are multiple instances of the NVMe (nvme),
PCIe bridge (pcieb), and generic block device (blkdev) driver present.
Each of these has their own
dev_info_t and has their various entry points
called in parallel. With that, let's dig into the specifics of what the
struct dev_ops actually is and the different operations to be aware.
struct dev_ops The device operations structure,
struct dev_ops, controls all of the basic
entry points that a loadable device contains. This is something that every
driver has to implement, no matter the type. The most important things
that will be present are the
devo_attach and
devo_detach members which are
used to create and destroy instances of the driver and then a pointer to
any subsequent operations that exist, such as the
devo_cb_ops, which is
used for character and block device drivers and the
devo_bus_ops, which is
used for nexus drivers.
Attach and detach are the most important entry points in this structure.
This could be practically thought of as the "main" function entry point for
a device driver. This is where any initialization of the instance will
occur. This would include many traditional things like setting up access
to registers, allocating and assigning interrupts, and interfacing with the
various other device driver frameworks such as
mac(9E).
The actions taken here are generally device-specific, while certain classes
of devices (e.g. PCI, USB, etc.) will have overlapping concerns. In
addition, this is where the driver will take care of creating anything like
a minor node which will be used to access it by userland software if it's a
character or block device driver.
There is generally a per-instance data structure that a driver creates. It
may do this by calling
kmem_zalloc(9F) and assigning the structure with the
ddi_set_driver_private(9F) entry point or it may use the DDI's soft state
management functions rooted in
ddi_soft_state_init(9F). A driver should
try to tie as much state to the instance as possible, where possible.
There should not be anything like a fixed size global array of possible
instances. Someone usually finds a way to attach many more instances of
some type of hardware than you might expect!
The
attach(9E) and
detach(9E) entry points both have a unique command
argument that is used to describe a specific action that is going on. This
action may be a normal attach or it could be related to putting the system
into the ACPI S3 sleep or similar state with the suspend and resume
commands.
The following table are the common functions that most drivers end up
having to think a little bit about:
struct dev_ops:
attach(9E) detach(9E) getinfo(9E) quiesce(9E) Briefly, the
getinfo(9E) entry point is used to map between instances of a
device driver and the minor nodes it creates. Drivers that participate in
a framework like the SCSI HBA, Networking, or related don't usually end up
implementing this. However, drivers that manually create minor nodes
generally do. The
quiesce(9E) entry point is used as part of the fast
reboot operation. It is basically intended to stop and/or reset the
hardware and discard any ongoing I/O. For pseudo-device drivers or drivers
which do not perform I/O, they can use the symbol `ddi_quiesce_not_needed'
in lieu of a standard implementation.
In addition, the following additional entry points exist, but are less
commonly required either because the system generally takes care of it,
such as
probe(9E).
identify(9E) power(9E) probe(9E) For more information on the structure, see also
dev_ops(9S). The following
are a few examples of the
struct dev_ops structure from a few drivers. We
recommend using the C99 style for all new instances.
static struct dev_ops ksensor_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = ksensor_getinfo,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = ksensor_attach,
.devo_detach = ksensor_detach,
.devo_reset = nodev,
.devo_power = ddi_power,
.devo_quiesce = ddi_quiesce_not_needed,
.devo_cb_ops = &ksensor_cb_ops
};
static struct dev_ops igc_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = NULL,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = igc_attach,
.devo_detach = igc_detach,
.devo_reset = nodev,
.devo_quiesce = ddi_quiesce_not_supported,
.devo_cb_ops = &igc_cb_ops
};
static struct dev_ops pchtemp_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = nodev,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = pchtemp_attach,
.devo_detach = pchtemp_detach,
.devo_reset = nodev,
.devo_quiesce = ddi_quiesce_not_needed
};
Character and Block Operations
In the history of UNIX, the most common device drivers that were created
were for block and character devices. The interfaces in block and
character devices are usually in service of common I/O patterns that the
system exposes. For example, when you call
open(2),
ioctl(2), or
read(2) on a device, it goes through the device's corresponding entry point here.
Both block and character devices operate on the shared
struct cb_ops structure, with different members being expected for both of them. While
they both require that someone implement the
cb_open and
cb_close members,
block devices perform I/O through the
strategy(9E) entry point and support
the
dump(9E) entry point for kernel crash dumps, while character devices
implement the more historically familiar
read(9E), write(9E,) and the
devmap(9E) entry point for supporting memory-mapping.
While the device operations structures worked with the
dev_info_t structure
and there was one per-instance, character and block operations work with
minor nodes: named entities that exist in the file system. UNIX has long
had the idea of a major and minor number that is encoded in the
dev_t which
is embedded in the file system, which is what you see in the
st_rdev member
of stat structure when you call
stat(2). The major number is assigned to
the driver
as a whole, not an instance. The minor number space is shared
between all instances of a driver. Minor node numbers are assigned by the
driver when it calls
ddi_create_minor_node(9F) to create a minor node and
when one of its character or block entry points are called, it will get
this minor number back and it must translate it to the corresponding
instance on its own.
A special property of the
open(9E) entry point is that it can change the
minor number a client gets during its call to open which it will use for
all subsequent calls. This is called a "cloning" open. Whether this is
used or not depends on the type of driver that you are creating. For
example, many pseudo-device drivers like DTrace will use this so each
client has its own state. Similarly, devices that have certain internal
locking and transaction schemes will give each caller a unique minor. The
ccid(4D) and
nvme(4D) driver are examples of this. However, many drivers
will have just a single minor node per instance and just say that the minor
node's number is the instance number, making it very simple to figure out
the mapping. When it's not so simple, often an AVL tree or some other
structure is used to help map this together.
The following entry points are generally used for character devices:
ioctl(9E) The I/O control or ioctl entry point is used extensively throughout
the system to perform different kinds of operations. These
operations are often driver specific, though there are also some
which are also common operations that are used across multiple
devices like the disk operations described in
dkio(4I) or the
ioctls that are used under the hood by
cfgadm(8) and friends.
Whether a driver supports ioctls or not depends on it. If it does,
it is up to the driver to always perform any requisite privilege
and permission checking as well as take care in copying in and out
any kind of memory from the user process through calls like
ddi_copyin(9F) and
ddi_copyout(9F).
The ioctl interface gives the driver writer great flexibility to
create equally useful or hard to consume interfaces. When crafting
a new committed interface over an ioctl, take care to ensure there
is an ability to version the structure or use something that has
more flexibility like a
nvlist_t. See the `Copying Data to and
from Userland' section of
Intro(9F) for more information.
read(9E),
write(9E),
aread(9E), and
awrite(9E) These are the classic I/O routines of the system. A driver's read
and write routines operate on a
uio(9S) structure which describes
the I/O that is occurring, the offset into the device that the I/O
should occur at, and has various flags that describe properties of
the I/O request, such as whether or not it is a non-blocking
request.
The majority of device drivers that implement these entry points
are using them to create some kind of file-like abstraction for a
device. For example, the
ccid(4D) driver uses these interfaces for
submitting commands and reading responses back from an underlying
device.
For most use cases
read(9E) and
write(9E) are sufficient; however,
the
aread(9E) and
awrite(9E) are versions that tie into the
kernel's asynchronous I/O engine.
chpoll(9E) This entry point allows a device to be polled by user code for an
event of interest and connects through the kernel to different
polling mechanisms such as
poll(2),
port_get(3C), and many others.
Currently this interface only allows a driver to define the classic
poll style events such as POLLIN, POLLOUT, and POLLHUP. The exact
semantics of these are up to the driver; however, it is expected
that the read and write oriented semantics of the various events
will be honored by the device driver.
devmap(9E) and
segmap(9E) These are entry points that are used to set up memory mappings for
a device and replace the older
mmap(9E) entry point. When a
function calls
mmap(2) on a device, it'll reach these, starting
with the
devmap(9E) entry point. The driver is responsible for
confirming that the mappings request and its semantics are
sensible, after which it will set up memory for consumption. The
devmap(9E) manual page has more details on the specifics here and
the related entry points that can be implemented as part of the
devmap_callback_ctl(9S) structures such as
devmap_access(9E). The
segment mapping is an optional part that provides some additional
controls for a driver such as assigning certain mapping attributes
or wanting to maintain separate contexts for different mappings.
See
segmap(9E) for more information. It is common for drivers to
just provide a
devmap(9E) entry point.
prop_op(9E) This entry point is used for drive's to manage and deal with
property creation. While this is its own entry point, most callers
can just specify
ddi_prop_op(9F) for this and don't need any
special handling.
The following entry points are used uniquely used for block devices:
strategy(9E) A driver's strategy entry point is used to actually perform I/O as
described by the
buf(9S) structure. It is responsible for
allocating all resources and then initiating the actual request.
The actual request will finish potentially asynchronously through
calls to
biodone(9F) or
bioerror(9F). HBA or blkdev-based drivers
do not usually end up implementing this interface.
dump(9E) A driver's dump implementation is used when the operating system
has had a fatal error and is trying to persist a crash dump to
disk. This is a delicate operation as the system has already
failed, which means many normal operations like interrupt handlers,
timeouts, and blocking will no longer work.
In general, the
print(9E) entry point for block devices is vestigial and
users should fill in
nodev(9F) there instead.
The following are some examples of different character device operations
structures that drivers have employed. Note that using C99 structure
definitions is preferred:
static struct cb_ops ksensor_cb_ops = {
.cb_open = ksensor_open,
.cb_close = ksensor_close,
.cb_strategy = nodev,
.cb_print = nodev,
.cb_dump = nodev,
.cb_read = nodev,
.cb_write = nodev,
.cb_ioctl = ksensor_ioctl,
.cb_devmap = nodev,
.cb_mmap = nodev,
.cb_segmap = nodev,
.cb_chpoll = nochpoll,
.cb_prop_op = ddi_prop_op,
.cb_flag = D_MP,
.cb_rev = CB_REV,
.cb_aread = nodev,
.cb_awrite = nodev
};
static struct cb_ops vio9p_cb_ops = {
.cb_rev = CB_REV,
.cb_flag = D_NEW | D_MP,
.cb_open = vio9p_open,
.cb_close = vio9p_close,
.cb_read = vio9p_read,
.cb_write = vio9p_write,
.cb_ioctl = vio9p_ioctl,
.cb_strategy = nodev,
.cb_print = nodev,
.cb_dump = nodev,
.cb_devmap = nodev,
.cb_mmap = nodev,
.cb_segmap = nodev,
.cb_chpoll = nochpoll,
.cb_prop_op = ddi_prop_op,
.cb_str = NULL,
.cb_aread = nodev,
.cb_awrite = nodev,
};
static struct cb_ops bd_cb_ops = {
bd_open, /* open */
bd_close, /* close */
bd_strategy, /* strategy */
nodev, /* print */
bd_dump, /* dump */
bd_read, /* read */
bd_write, /* write */
bd_ioctl, /* ioctl */
nodev, /* devmap */
nodev, /* mmap */
nodev, /* segmap */
nochpoll, /* poll */
bd_prop_op, /* cb_prop_op */
0, /* streamtab */
D_64BIT | D_MP, /* Driver compatibility flag */
CB_REV, /* cb_rev */
bd_aread, /* async read */
bd_awrite /* async write */
};
Networking Drivers
Networking device drivers come in many forms and flavors. They may
interface to the host via PCIe, USB, be a pseudo-device, or use something
entirely different like SPI (Serial Peripheral Interface). The system
provides a dedicated networking interface driver framework that is
documented in
mac(9E). This framework is sometimes also referred to as
GLDv3 (Generic LAN Device version 3).
All networking drivers will still implement a basic
struct dev_ops and a
minimal
struct cb_ops. The
mac(9E) framework takes care of implementing
all of the standard character device entry points at the end of the day and
instead provides a number of different networking-specific entry points
that take care of things like getting and setting properties, installing
and removing MAC addresses and filters, and actually transmitting and
providing callbacks for receiving packets.
Each instance of a device driver will generally have a separate
registration with
mac(9E). In other words, there is usually a one to one
relationship between a driver having its
attach(9E) entry point called and
it registering with the
mac(9E) framework.
STREAMS Modules
STREAMS modules are a historical way to provide certain services in the
kernel. For networking device drivers, instead see the prior section and
mac(9E). Conceptually STREAMS break things into queues, with one side
being designed for a module to read data and another side for it write or
produce data. These modules are arranged in a stack, with additional
modules being pushed on for additional processing. For example, the TTY
subsystem has a serial console as a base STREAMS module, but it then pushes
on additional modules like the pseudo-terminal emulation (
ptem(4M)), the
standard line discipline (
ldterm(4M)), etc.
STREAMS drivers don't use the normal character device entry points (though
sometimes they do define them) or even the
struct modldrv. Instead they
use the
struct modlstrmod which is discussed in
modlstrmod(9S), which in
turn requires one to fill out the
fmodsw(9S),
streamtab(9S), and
qinit(9S) structures. The latter of these has two of the more common entry points:
put(9E) srv(9E) These entry points are used when different kinds of messages are received
by the device driver on a queue. In addition, those entry points define an
alternative set of entry points for
open(9E) and
close(9E) as STREAMS
modules open and close routines all operate in the context of a given
queue_t. There are other differences here. An ioctl is not a dedicated
entry point, but rather a specific message type (M_IOCTL) that is received
in a driver's
put(9E) routine.
Finally, it's worth noting the
mt-streams(9F) manual page which discusses
several concurrency related considerations for STREAMS related drivers.
HBA Drivers
Host bus adapters are used to interface with the various SCSI and SAS
controllers. Like with networking, the kernel provides a framework under
the name of SCSA. HBA drivers still often implement character device entry
points; however, they generally end up calling into shared framework entry
points for
open(9E),
ioctl(9E), and
close(9E). For several of the concepts
related with the 3rd version for the framework, see
iport(9).
The following entry points are associated with HBA drivers:
tran_abort(9E) tran_bus_reset(9E) tran_dmafree(9E) tran_getcap(9E) tran_init_pkt(9E) tran_quiesce(9E) tran_reset(9E) tran_reset_notify(9E) tran_setup_pkt(9E) tran_start(9E) tran_sync_pkt(9E) tran_tgt_free(9E) tran_tgt_init(9E) tran_tgt_probe(9E) In addition to these, when using SCSAv3 with iports, drivers will call
scsi_hba_iport_register(9F) to create various iports. This has the unique
effect of causing the driver's top-level
attach(9E) entry point to be
called again, but referring to the iport instead of the main hardware
instance.
USB Drivers
The kernel provides a framework for USB client devices to access various
USB services such as getting access to device and configuration
descriptors, issuing control, bulk, interrupt, and isochronous requests,
and being notified when they are removed from the system. Generally a USB
device driver leverages a framework of some kind, like
mac(9E) in addition
to the USB pieces. As such, there are no entry points specific to USB
device drivers; however, there are plenty of provided functions.
To get started with a USB device driver, one will generally perform some of
the following steps:
1. Register with the USB framework by calling
usb_client_attach(9F).
2. Ask the kernel to fetch all of the device and class descriptors that
are appropriate with the
usb_get_dev_data(9F) function.
3. Parse the relevant descriptors to figure out which endpoints to
attach.
4. Open up pipes to the specific USB endpoints by using
usb_lookup_ep_data(9F),
usb_ep_xdescr_fill(9F), and
usb_pipe_xopen(9F).
5. Proceed with the rest of device initialization and service.
Sensors
Many devices embed sensors in them, such as a networking ASIC that tracks
its junction temperature. The kernel provides the
ksensor(9E) (kernel
sensor) framework to allow device drivers to implement sensors with a
minimal set of callback functions. Any device driver, whether it's
providing services through another framework or not, can implement the
ksensor operations. Drivers do not need to implement any character device
operations directly. They are instead provided via the
ksensor(4D) driver.
A driver registers with the ksensor framework during its
attach(9E) entry
point and must implement the functions described in
ksensor_ops(9E) for
each sensor that it creates. These interfaces include:
kso_kind(9E) kso_scalar(9E) Virtio Drivers
The kernel provides an uncommitted interface for Virtio device drivers,
which is discussed in some detail in
uts/common/io/virtio/virtio.h. A
client device driver will register with the framework through and then use
that to begin feature and interrupt negotiation. As part of that, they are
given the ability to set up virtqueues which can be used for communicating
to and from the hypervisor.
Kernel Statistics
Drivers have the ability to export kstats (kernel statistics) that will
appear in the
kstat(8) command. Any kind of module in the system can
create and register a kstat, it is not strictly tied to anything like a
dev_info_t. kstats have different types that they come in. The most
common kstat type is the KSTAT_TYPE_NAMED which allows for multiple, typed
name-value pairs to be part of the stat. This is what the kernel uses
under the hood for many things such as the various
mac(9E) statistics that
are managed on behalf of drivers.
To create a kstat, a driver utilizes the
kstat_create(9F) function, after
which it has a chance to set up the kstat and make choices about which
entry points that it will implement. A kstat will not be made visible
until the caller calls
kstat_install(9F) on it. The two entry points that
a driver may implement are:
ks_snapshot(9E) ks_update(9E) First, let's discuss the
ks_update(9E) entry point. A kstat may be updated
in one of two ways: either by having its
ks_update(9E) function called or
by having the system update information as it goes in the kstat's data.
One would use the former when it involves doing something like going out to
hardware and reading registers, where as the latter approach might be used
when operations can be tracked as part of a normal flow, such as the number
of errors or particular requests a driver has encountered. The
ks_snapshot(9E) entry point is not as commonly used by comparison and
allows a caller to interpose on the data marshalling process for copying
out to userland.
Upgradable Firmware Modules
The UFM (Upgradable Firmware Module) system in the kernel allows a device
driver to provide information about the firmware modules that are present
on a device and is generally used as supplementary information about a
device. The UFM framework allows a driver to declare a given number of
modules that exist on a given
dev_info_t. Each module has some number of
slots with different versions. This information is automatically exported
into various consumers such as
fwflash(8), the Fault Management
Architecture, and the
ufm(4D) driver's specific ioctls.
A driver fills in the operations vector discussed in
ddi_ufm(9E) and
registers it with the kernel by calling
ddi_ufm_init(9F). These interfaces
have entry points include:
ddi_ufm_op_getcaps(9E) ddi_ufm_op_nimages(9E) ddi_ufm_op_fill_image(9E) ddi_ufm_op_fill_slot(9E) ddi_ufm_op_readimg(9E) The
ddi_ufm_op_getcaps(9E) entry point describes the capabilities of the
device and what other entry points the kernel and callers can expect to
exist. The
ddi_ufm_op_nimages(9E) entry point tells the system how many
images there are and if it is not implemented, then the system assumes
there is a single slot. The
ddi_ufm_op_fill_image(9E) and
ddi_ufm_op_fill_slot(9E) entry points are used to fill in information about
slots and images respectively, while the
ddi_ufm_op_readimg(9E) entry point
is used to read an image from the device for the operating system. That
entry point is often supported when dealing with EEPROMs as many devices do
not have a way of retrieving the actual current firmware.
USB Host Interface Drivers
Opposite of USB device drivers are the device drivers that make the USB
abstractions work: USB host interface controllers. The kernel provides a
private framework for these, which is discussed in
usba_hcdi(9E). A HCDI
driver is a character device driver and ends up also instantiating a root
hub as part of its operation and forwards many of its open, close, and
ioctl routines to the corresponding usba hubdi functions.
To get started with the framework, a driver will need to call
usba_hcdi_register(9F) with a filled out
usba_hcdi_register_args_t(9S) structure. That registration structure includes the operation vector of
callbacks that the driver fills in, which involve opening and closing pipes
(
usba_hcdi_pipe_open(9E)), issuing the various ctrl, interrupt, bulk, and
isochronous transfers (
usba_hcdi_pipe_bulk_xfer(9E), etc.), and more.
DTRACE PROBES
By default, the DTrace
fbt(4D), function boundary tracing, provider will
create DTrace probes based on the entry and return points of most functions
in a module (the primary exception being for some hand-written assembler).
While this is very powerful, there are often times that driver writers want
to define their own semantic probes. The
sdt(4D), statically defined
tracing, provider can be used for this.
To define an SDT probe, a driver should include <
sys/sdt.h>, which defines
several macros for probes based on the number of arguments that are
present. Each probe takes a name, which is constrained by the rules of a C
identifier. If two underscore characters are present in a row (`_') they
will be transformed into a hyphen (`-'). That is a probe declared with a
name of `hello__world' will be named `hello-world' and accessible as the
DTrace probe `sdt:::hello-world'.
Each probe can present a varying number of arguments in DTrace, ranging
from 0-8. For each DTrace probe argument, one passes both the type of the
argument and the actual value. The following example from the
igc(4D) driver shows a DTrace probe that provides four arguments and would be
accessible using the probe `sdt:::igc-context-desc':
DTRACE_PROBE4(igc__context__desc, igc_t *, igc, igc_tx_ring_t *,
ring, igc_tx_state_t *, tx, struct igc_adv_tx_context_desc *,
ctx);
In the above example,
igc,
ring,
tx, and
ctx are local variables and
function parameters.
By default SDT probes are considered
Volatile, in other words they can
change at any time and disappear. This is used to encourage widespread use
of SDT probes for what may be useful for a particular problem or issue that
is being investigated. SDT probes that are stabilized are transformed into
their own first class provider.
SEE ALSO
Intro(9),
Intro(9F),
Intro(9S)illumos May 23, 2024 illumos