Brendan Gregg at illumos Day.
Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel. For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants. This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.
DTracing the Cloud
Brendan Gregg
Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel. For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants. This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.
Note: This is NOT the entire text of the slides – text surrounding/referring to diagrams and other images can be seen in the video.)
whoami
G’Day, I’m Brendan
These days I do performance analysis of the cloud
I use the right tool for the job; sometimes traditional, often DTrace.
DTrace is a magician that conjures up rainbows, ponies and unicorns — and does it all entirely safely and in production!
Or, the version with fewer ponies: DTrace is a performance analysis and troubleshooting tool
Instruments all software, kernel and user-land. Production safe. Designed for minimum overhead.
Default in SmartOS, Oracle Solaris, Mac OS X and FreeBSD. Two Linux ports are in development.
There’s a couple of awesome books about it.
illumos
Joyent’s SmartOS uses (and contributes to) the illumos kernel.
illumos is the most DTrace-featured kernel
illumos community includes Bryan Cantrill & Adam Leventhal, DTrace co-inventors.
Agenda
Theory: Cloud types and DTrace visibility
Reality
- DTrace and Zones
- DTrace Wins
Tools
- DTrace Cloud Tools
- Cloud Analytics
Cloud Types
We deploy two types of virtualization on SmartOS/illumos:
- Hardware Virtualization: KVM
- OS-Virtualization: Zones
- Both virtualization types can co-exist
KVM
- Used for Linux and Windows guests
- Legacy apps
Zones
- Used for SmartOS guests (zones) called SmartMachines
- Preferred over Linux:
- Bare-metal performance Less memory overheads Better visibility (debugging)
- Global Zone == host, Non-Global Zone == guest
- Also used to encapsulate KVM guests (double-hull security)
DTrace can be used for:
- Performance analysis: user- and kernel-level
- Troubleshooting
Specifically, for the cloud:
- Performance effects of multi-tenancy
- Effectiveness and troubleshooting of performance isolation
Four contexts:
- KVM host, KVM guest, Zones host, Zones guest
- FAQ: What can DTrace see in each context?
Hardware Virtualization: DTrace Visibility
Host can see:
- Entire host: kernel, apps
- Guest disk I/O (block-interface-level)
- Guest network I/O (packets)
- Guest CPU MMU context register
Host can’t see:
- Guest kernel
- Guest apps
- Guest disk/network context (kernel stack)
- … unless the guest has DTrace, and access (SSH) is allowed
Hardware Virtualization: DTrace Visibility
Guest can see: Guest kernel, apps, provided DTrace is available
Guest can’t see:
- Other guests
- Host kernel, apps
OS Virtualization: DTrace Visibility
Host can see:
- Entire host: kernel, apps
- Entire guests: apps
Operators can trivially see the entire cloud
- Direct visibility from host of all tenant processes
Zooming in, 1 host, 10 guests:
All can be examined with 1 DTrace invocation; don’t need multiple SSH or API logins per-guest. Reduces observability framework overhead by a factor of 10 (guests/host)
OS Virtualization: DTrace Visibility
Guest can see:
- Guest apps
- Some host kernel (in guest context), as configured by DTrace zone privileges
Guest can’t see:
- Other guests
- Host kernel (in non-guest context), apps
DTrace and Zones
DTrace and Zones were developed in parallel for Solaris 10, and then integrated.
DTrace functionality for the Global Zone (GZ) was added first.
- This is the host context, and allows operators to use DTrace to inspect all tenants.
DTrace functionality for the Non-Global Zone (NGZ) was harder, and some capabilities added later (2006):
- Providers: syscall, pid, profile
- This is the guest context, and allows customers to use DTrace to inspect themselves only (can’t see neighbors).
GZ DTrace works well. We found many issues in practice with NGZ DTrace:
- Can’t read fds[] to translate file descriptors. Makes using the syscall provider more difficult.
- Can’t read curpsinfo, curlwpsinfo, which breaks many scripts (eg, curpsinfo->pr_psargs, or curpsinfo->pr_dmodel)
- Missing vminfo, sysinfo, and sched providers. Can’t read cpu built-in.
- profile probes behave oddly. Eg, profile:::tick-1s only fires if tenant is on-CPU at the same time as the probe would fire. Makes any script that produces interval-output unreliable.
- These and other bugs have since been fixed for SmartOS/illumos (thanks Bryan Cantrill!)
DTrace Wins
Aside from the NGZ issues, DTrace has worked well in the cloud and solved numerous issues.
Tools: Ad-hocWrite DTrace scripts as needed
Execute individually on hosts, or,
With ah-hoc scripting, execute across all hosts (cloud)
My ad-hoc tools include:
- DTrace Cloud Tools
- Flame Graphs
Ad-hoc: DTrace Cloud Tools
Contains around 70 ad-hoc DTrace tools written by myself for operators and cloud customers.
Customer scripts are linked from the “smartmachine” directory https://github.com/brendangregg/dtrace-cloud-tools
For example, tcplistendrop.d traces each kernel-dropped SYN due to TCP backlog overflow (saturation)
Can explain multi-second client connect latency.
tcplistendrop.d processes IP and TCP headers from the in-kernel packet buffer
Since this traces the fbt provider (kernel), it is operator only.
A related example: tcpconnreqmaxq-pid*.d prints a summary, showing backlog lengths (on SYN arrival), the current max, and drops
Ad-hoc: Flame Graphs
Visualizing CPU time using DTrace profiling and SVG
Product
Cloud observability products including DTrace:
Joyent’s Cloud Analytics
- For operators and cloud customers
- Observes entire cloud, in real-time
- Latency focus, including heat maps
- Instrumentation: DTrace and kstats
- Front-end: Browser JavaScript
- Back-end: node.js and C
Case Studies
Slow disks
Customer complains of poor MySQL performance - Noticed disks are busy via iostat-based monitoring software, and have blamed noisy neighbors causing disk I/O contention.
Multi-tenancy and performance isolation are common cloud issues•
By measuring FS latency in application-synchronous context we can either confirm or rule-out FS/disk origin latency. Including expressing FS latency during MySQL query, so that the issue can be quantified, and speedup calculated.
Ideally, this would be possible from within the SmartMachine, so both customer and operator can run the DTrace script. This is possible using:
- pid provider: trace and time MySQL FS functions
- syscall provider: trace and time read/write syscalls for FS file descriptors (hence needing fds[].fi_fs; otherwise cache open())
- mysql_pid_fslatency.d from dtrace-cloud-tools
- Shows FS latency as a proportion of Query latency
- mysld_pid_fslatency_slowlog*.d in dtrace-cloud-tools
The cloud operator can trace kernel internals. Eg, the VFS->ZFS interface using zfsslower.d:
- My go-to tool (does all apps). This example showed if there were VFS-level I/O > 10ms? (arg == 10)
- Stupidly easy to do
zfs_read() entry -> return; same for zfs_write().
The operator can use deeper tools as needed. Anywhere in ZFS.
Cloud Analytics, for either operator or customer, can be used to examine the full latency distribution, including outliers
Found that the customer problem was not disks or FS (99% of the time), but was CPU usage during table joins.
On Joyent’s IaaS architecture, it’s usually not the disks or filesystem; useful to rule that out quickly.
Some of the time it is, due to:
- Bad disks (1000+ms I/O)
- Controller issues (PERC)
- Big I/O (how quick is a 40 Mbyte read from cache?)
- Other tenants (benchmarking!). Much less for us now with ZFS I/O throttling (thanks Bill Pijewski), used for disk performance isolation in the SmartOS cloud.
- Customer resolved real issue
- Prior to DTrace analysis, had spent months of poor performance believing disks were to blame
Kernel scheduler
Customer problem: occasional latency outliers
Analysis: no smoking gun. No slow I/O or locks, etc. Some random dispatcher queue latency, but with CPU headroom.
TS (and FSS) check for CPU starvation
Experimentation: run 2 CPU-bound threads, 1 CPU Subsecond offset heat maps
Worst case (4 threads 1 CPU), 44 sec dispq latency
Required the operator of the cloud to debug
- Even if the customer doesn’t have kernel-DTrace access in the zone, they still benefit from the cloud provider having access
- Ask your cloud provider to trace scheduler internals, in case you have something similar
On Hardware Virtualization, scheduler issues can be terrifying
Each kernel believes they own the hardware.
Had a networking performance issue on KVM; debugged using:
- Host: DTrace
- Guests: Prototype DTrace for Linux, SystemTap
Took weeks to debug the kernel scheduler interactions and determine the fix for an 8x win.
Thank you!
http://dtrace.org/blogs/brendan
twitter @brendangregg
Resources:
http://www.slideshare.net/bcantrill/dtrace-in-the-nonglobal-zone
http://dtrace.org/blogs/dap/2011/07/27/oscon-slides/
https://github.com/brendangregg/dtrace-cloud-tools
http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
http://dtrace.org/blogs/brendan/2012/08/09/10-performance-wins/
http://dtrace.org/blogs/brendan/2011/10/04/visualizing-the-cloud/
Thanks @dapsays and team for Cloud Analytics, Bryan Cantrill for DTrace fixes, @rmustacc for KVM perf war, and @DeirdreS for another great event.
![]()