SmartOS training from Joyent!

Video: DTracing the Cloud

Brendan Gregg at illumos Day.

Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel. For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants. This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.

slides

DTracing the Cloud

Brendan Gregg

Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel. For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants. This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.

Note: This is NOT the entire text of the slides – text surrounding/referring to diagrams and other images can be seen in the video.)

whoami

G’Day, I’m Brendan

These days I do performance analysis of the cloud

I use the right tool for the job; sometimes traditional, often DTrace.

DTrace is a magician that conjures up rainbows, ponies and unicorns — and does it all entirely safely and in production!

Or, the version with fewer ponies: DTrace is a performance analysis and troubleshooting tool

Instruments all software, kernel and user-land. Production safe. Designed for minimum overhead.

Default in SmartOS, Oracle Solaris, Mac OS X and FreeBSD. Two Linux ports are in development.

There’s a couple of awesome books about it.

illumos

Joyent’s SmartOS uses (and contributes to) the illumos kernel.

illumos is the most DTrace-featured kernel

illumos community includes Bryan Cantrill & Adam Leventhal, DTrace co-inventors.

Agenda

Theory: Cloud types and DTrace visibility

Reality

  • DTrace and Zones
  • DTrace Wins

Tools

  • DTrace Cloud Tools
  • Cloud Analytics

 

Cloud Types

We deploy two types of virtualization on SmartOS/illumos:

  • Hardware Virtualization: KVM
  • OS-Virtualization: Zones
  • Both virtualization types can co-exist

KVM

  • Used for Linux and Windows guests
  • Legacy apps

Zones

  • Used for SmartOS guests (zones) called SmartMachines
  • Preferred over Linux:
    • Bare-metal performance Less memory overheads Better visibility (debugging)
    • Global Zone == host, Non-Global Zone == guest
    • Also used to encapsulate KVM guests (double-hull security)

DTrace can be used for:

  • Performance analysis: user- and kernel-level
  • Troubleshooting

Specifically, for the cloud:

  • Performance effects of multi-tenancy
  • Effectiveness and troubleshooting of performance isolation

Four contexts:

  • KVM host, KVM guest, Zones host, Zones guest
  • FAQ: What can DTrace see in each context?

Hardware Virtualization: DTrace Visibility

Host can see:

  • Entire host: kernel, apps
  • Guest disk I/O (block-interface-level)
  • Guest network I/O (packets)
  • Guest CPU MMU context register

Host can’t see:

  • Guest kernel
  • Guest apps
  • Guest disk/network context (kernel stack)
  • … unless the guest has DTrace, and access (SSH) is allowed

Hardware Virtualization: DTrace Visibility

Guest can see: Guest kernel, apps, provided DTrace is available

Guest can’t see:

  • Other guests
  • Host kernel, apps

OS Virtualization: DTrace Visibility

Host can see:

  • Entire host: kernel, apps
  • Entire guests: apps

Operators can trivially see the entire cloud

  • Direct visibility from host of all tenant processes

Zooming in, 1 host, 10 guests:

All can be examined with 1 DTrace invocation; don’t need multiple SSH or API logins per-guest. Reduces observability framework overhead by a factor of 10 (guests/host)

OS Virtualization: DTrace Visibility

Guest can see:

  • Guest apps
  • Some host kernel (in guest context), as configured by DTrace zone privileges

Guest can’t see:

  • Other guests
  • Host kernel (in non-guest context), apps

DTrace and Zones

DTrace and Zones were developed in parallel for Solaris 10, and then integrated.

DTrace functionality for the Global Zone (GZ) was added first.

  • This is the host context, and allows operators to use DTrace to inspect all tenants.

DTrace functionality for the Non-Global Zone (NGZ) was harder, and some capabilities added later (2006):

  • Providers: syscall, pid, profile
  • This is the guest context, and allows customers to use DTrace to inspect themselves only (can’t see neighbors).

GZ DTrace works well. We found many issues in practice with NGZ DTrace:

  • Can’t read fds[] to translate file descriptors. Makes using the syscall provider more difficult.
  • Can’t read curpsinfo, curlwpsinfo, which breaks many scripts (eg, curpsinfo->pr_psargs, or curpsinfo->pr_dmodel)
  • Missing vminfo, sysinfo, and sched providers. Can’t read cpu built-in.
  • profile probes behave oddly. Eg, profile:::tick-1s only fires if tenant is on-CPU at the same time as the probe would fire. Makes any script that produces interval-output unreliable.
  • These and other bugs have since been fixed for SmartOS/illumos (thanks Bryan Cantrill!)

DTrace Wins

Aside from the NGZ issues, DTrace has worked well in the cloud and solved numerous issues.

ToolsAd-hocWrite DTrace scripts as needed

Execute individually on hosts, or,

With ah-hoc scripting, execute across all hosts (cloud)

My ad-hoc tools include:

  • DTrace Cloud Tools
  • Flame Graphs

Ad-hoc: DTrace Cloud Tools

Contains around 70 ad-hoc DTrace tools written by myself for operators and cloud customers.

Customer scripts are linked from the “smartmachine” directory https://github.com/brendangregg/dtrace-cloud-tools

For example, tcplistendrop.d traces each kernel-dropped SYN due to TCP backlog overflow (saturation)

Can explain multi-second client connect latency.

tcplistendrop.d processes IP and TCP headers from the in-kernel packet buffer

Since this traces the fbt provider (kernel), it is operator only.

A related example: tcpconnreqmaxq-pid*.d prints a summary, showing backlog lengths (on SYN arrival), the current max, and drops

Ad-hoc: Flame Graphs

Visualizing CPU time using DTrace profiling and SVG

Product

Cloud observability products including DTrace:

Joyent’s Cloud Analytics

  • For operators and cloud customers
  • Observes entire cloud, in real-time
  • Latency focus, including heat maps
  • Instrumentation: DTrace and kstats
  • Front-end: Browser JavaScript
  • Back-end: node.js and C

Case Studies

Slow disks

Customer complains of poor MySQL performance - Noticed disks are busy via iostat-based monitoring software, and have blamed noisy neighbors causing disk I/O contention.

Multi-tenancy and performance isolation are common cloud issues•

By measuring FS latency in application-synchronous context we can either confirm or rule-out FS/disk origin latency. Including expressing FS latency during MySQL query, so that the issue can be quantified, and speedup calculated.

Ideally, this would be possible from within the SmartMachine, so both customer and operator can run the DTrace script. This is possible using:

  • pid provider: trace and time MySQL FS functions
  • syscall provider: trace and time read/write syscalls for FS file descriptors (hence needing fds[].fi_fs; otherwise cache open())
  • mysql_pid_fslatency.d from dtrace-cloud-tools
    • Shows FS latency as a proportion of Query latency
    • mysld_pid_fslatency_slowlog*.d in dtrace-cloud-tools

The cloud operator can trace kernel internals. Eg, the VFS->ZFS interface using zfsslower.d:

  • My go-to tool (does all apps). This example showed if there were VFS-level I/O > 10ms? (arg == 10)
  • Stupidly easy to do

zfs_read() entry -> return; same for zfs_write().

The operator can use deeper tools as needed. Anywhere in ZFS.

Cloud Analytics, for either operator or customer, can be used to examine the full latency distribution, including outliers

Found that the customer problem was not disks or FS (99% of the time), but was CPU usage during table joins.

On Joyent’s IaaS architecture, it’s usually not the disks or filesystem; useful to rule that out quickly.

Some of the time it is, due to:

  • Bad disks (1000+ms I/O)
  • Controller issues (PERC)
  • Big I/O (how quick is a 40 Mbyte read from cache?)
  • Other tenants (benchmarking!). Much less for us now with ZFS I/O throttling (thanks Bill Pijewski), used for disk performance isolation in the SmartOS cloud.
  • Customer resolved real issue
  • Prior to DTrace analysis, had spent months of poor performance believing disks were to blame

Kernel scheduler

Customer problem: occasional latency outliers

Analysis: no smoking gun. No slow I/O or locks, etc. Some random dispatcher queue latency, but with CPU headroom.

TS (and FSS) check for CPU starvation

Experimentation: run 2 CPU-bound threads, 1 CPU Subsecond offset heat maps

Worst case (4 threads 1 CPU), 44 sec dispq latency

Required the operator of the cloud to debug

  • Even if the customer doesn’t have kernel-DTrace access in the zone, they still benefit from the cloud provider having access
  • Ask your cloud provider to trace scheduler internals, in case you have something similar

On Hardware Virtualization, scheduler issues can be terrifying

Each kernel believes they own the hardware.

Had a networking performance issue on KVM; debugged using:

  • Host: DTrace
  • Guests: Prototype DTrace for Linux, SystemTap

Took weeks to debug the kernel scheduler interactions and determine the fix  for an 8x win.

Thank you!

http://dtrace.org/blogs/brendan

twitter @brendangregg

Resources:

http://www.slideshare.net/bcantrill/dtrace-in-the-nonglobal-zone

http://dtrace.org/blogs/dap/2011/07/27/oscon-slides/

https://github.com/brendangregg/dtrace-cloud-tools

http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/

http://dtrace.org/blogs/brendan/2012/08/09/10-performance-wins/

http://dtrace.org/blogs/brendan/2011/10/04/visualizing-the-cloud/

Thanks @dapsays and team for Cloud Analytics, Bryan Cantrill for DTrace fixes, @rmustacc for KVM perf war, and @DeirdreS for another great event.

Share this post:

Vote on HN