illumos: manual page: amd_f17h_zen1

AMD_F17H_ZEN1_EVENTS(3CPC) CPU Performance Counters Library Functions

NAME

amd_f17h_zen1_events - AMD Family 17h Zen1 processor performance monitoring
events

DESCRIPTION

This manual page describes events specfic to AMD Family 17h Zen1
processors. For more information, please consult the appropriate AMD BIOS
and Kernel Developer's guide or Open-Source Register Reference.

Each of the events listed below includes the AMD mnemonic which matches the
name found in the AMD manual and a brief summary of the event. If
available, a more detailed description of the event follows and then any
additional unit values that modify the event. Each unit can be combined to
create a new event in the system by placing the '.' character between the
event name and the unit name.

The following events are supported:

FpuPipeAssignment
Core::X86::Pmc::Core::FpuPipeAssignment - FPU Pipe Assignment

The number of operations (uOps) and dual-pipeuOps dispatched to
each of the 4 FPU execution pipelines. This event reflects how busy
the FPU pipelines are and may be used for workload
characterization. This includes all operations performed by x87,
MMXTM, and SSE instructions, including moves. Each increment
represents a one-cycle dispatch event. This event is a speculative
event. (See Core::X86::Pmc::Core::ExRetMmxFpInstr). Since this
event includes non- numeric operations it is not suitable for
measuring MFLOPS.

This event has the following units which may be used to modify the
behavior of the event:

Dual3 Total number multi-pipe uOps assigned to Pipe 3

Dual2 Total number multi-pipe uOps assigned to Pipe 2

Dual1 Total number multi-pipe uOps assigned to Pipe 1

Dual0 Total number multi-pipe uOps assigned to Pipe 0

Total3 Total number uOps assigned to Pipe 3

Total2 Total number uOps assigned to Pipe 2

Total1 Total number uOps assigned to Pipe 1

Total0 Total number uOps assigned to Pipe 0

FpSchedEmpty
Core::X86::Pmc::Core::FpSchedEmpty - FP Scheduler Empty

This is a speculative event. The number of cycles in which the FPU
scheduler is empty. Note that some Ops like FP loads bypass the
scheduler. Invert this (Core::X86::Msr::PERF_CTL[Inv] == 1) to
count cycles in which at least one FPU operation is present in the
FPU.

FpRetx87FpOps
Core::X86::Pmc::Core::FpRetx87FpOps - Retired x87 Floating Point
Operations

The number of x87 floating-point Ops that have retired. The number
of events logged per cycle can vary from 0 to 8.

This event has the following units which may be used to modify the
behavior of the event:

DivSqrROps
Divide and square root Ops

MulOps Multiply Ops

AddSubOps
Add/subtract Ops

FpRetSseAvxOps
Core::X86::Pmc::Core::FpRetSseAvxOps - Retired SSE/AVX Operations

This is a retire-based event. The number of retired SSE/AVX FLOPS.
The number of events logged per cycle can vary from 0 to 64. This
event can count above 15. See 2.1.11.2 [Large Increment per Cycle
Events]

This event has the following units which may be used to modify the
behavior of the event:

DpMultAddFlops
Double precision multiply-add FLOPS. Multiply-add counts as
2 FLOPS.

DpDivFlops
Double precision divide/square root FLOPS.

DpMultFlops
Double precision multiply FLOPS.

DpAddSubFlops
Double precision add/subtract FLOPS.

SpMultAddFlops
Single precision multiply-add FLOP. Multiply-add counts as
2 FLOPS.

SpDivFlops
Single-precision divide/square root FLOPS

SpMultFlops
Single-precision multiply FLOPS

SpAddSubFlops
Single-precision add/subtract FLOPS

FpNumMovElimScalOp
Core::X86::Pmc::Core::FpNumMovElimScalOp - Number of Move
Elimination and Scalar Op Optimization

This is a dispatch based speculative event, and is useful for
measuring the effectiveness of the Move elimination and Scalar code
optimization schemes.

This event has the following units which may be used to modify the
behavior of the event:

Optimized
Number of Scalar Ops optimized

OptPotential
Number of Ops that are candidates for optimization (have Z-
bit either set or pass).

SseMovOpsElim
Number of SSE Move Ops eliminated

SseMovOps
Number of SSE Move Ops

FpRetiredSerOps
Core::X86::Pmc::Core::FpRetiredSerOps - Retired Serializing Ops

The number of serializing Ops retired.

This event has the following units which may be used to modify the
behavior of the event:

X87CtrlRet
x87 control word mispredict traps due to mispredictions in
RC or PC, or changes in mask bits

X87BotRet
x87 bottom-executing uOps retired

SseCtrlRet
SSE control word mispredict traps due to mispredictions in
RC, FTZ or DAZ, or changes in mask bits

SseBotRet
SSE bottom-executing uOps retired

LsBadStatus2
Core::X86::Pmc::Core::LsBadStatus2 - Bad Status 2

Store To Load Interlock (STLI) are loads that were unable to
complete because of a possible match with an older store, and the
older store could not do STLF for some reason. There are a number
of reasons why this occurs, and this perfmon organizes them into
three major groups.

This event has the following units which may be used to modify the
behavior of the event:

StlfNoData
The load is capable of forwarding from an older store (i.e.
the address match/overlap between the load and the older
store) was good and everything works from an address
perspective, but the store's data has not been produced by
EX or FP yet so it can't be forwarded.

StliOther
All the other reasons. The most common among these is that
there is only a partial overlap between the store and the
load, for example there's an 8B store to address A and a
16B load starting at address A. STLF can't be performed in
this case because only some of the load's data is coming
fromthe store, so the load gets StliOther. Another
StliOther case is if the load hits a non-cacheable store
that's sitting in the non-cacheable buffers (WCBs).

StliNoState
The STLF is validated using DC way instead of an address
compare. The store that wants to STLF is required to be a
DC hit and have a valid DC way. The STLF candidate store is
chosen based on address bits 11:0 overlap, and the DC way
of that store is compared to the way of the load. If the
store is in a DC miss state, then it doesn't have a valid
DC way and so cannot validate STLF. The load gets
StliNoState and can't complete. Read-write

LsLocks
Core::X86::Pmc::Core::LsLocks - Locks

LsRetClClush
Core::X86::Pmc::Core::LsRetClClush - Retired CLFLUSH Instructions

The number of retired CLFLUSH instructions. This is a non-
speculative event.

LsRetCpuid
Core::X86::Pmc::Core::LsRetCpuid - Retired CPUID Instructions

The number of CPUID instructions retired.

LsDispatch
Core::X86::Pmc::Core::LsDispatch - LS Dispatch

Counts the number of operations dispatched to the LS unit.

LsSmiRx
Core::X86::Pmc::Core::LsSmiRx - SMIs Received

Counts the number of SMIs received.

LsSTLF Core::X86::Pmc::Core::LsSTLF - Store to Load Forward

Number of STLF hits.

LsStCommitCancel2
Core::X86::Pmc::Core::LsStCommitCancel2 - Store Commit Cancels 2

This event has the following units which may be used to modify the
behavior of the event:

StCommitCancelWcbFull
A non-cacheable store and the non-cacheable commit buffer
is full.

LsDcAccesses
Core::X86::Pmc::Core::LsDcAccesses - Data Cache Accesses

The number of accesses to the data cache for load and store
references. This may include certain microcode scratchpad accesses,
although these are generally rare. Each increment represents an
eight-byte access, although the instruction may only be accessing a
portion of that. This event is a speculative event.

LsRefillsFromSys
Core::X86::Pmc::Core::LsRefillsFromSys - Data Cache Refills from
System

Demand Data Cache Fills by Data Source.

This event has the following units which may be used to modify the
behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is on
a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX and
the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsL1DTlbMiss
Core::X86::Pmc::Core::LsL1DTlbMiss - L1 DTLB Miss

This event has the following units which may be used to modify the
behavior of the event:

TlbReload1GL2Miss

TlbReload2ML2Miss

TlbReload32KL2Miss

TlbReload4KL2Miss

TlbReload1GL2Hit

TlbReload2ML2Hit

TlbReload32KL2Hit

TlbReload4KL2Hit

LsTablewalker
Core::X86::Pmc::Core::LsTablewalker - Tablewalker allocation

This event has the following units which may be used to modify the
behavior of the event:

PerfMonTablewalkAllocIside1

PerfMonTablewalkAllocIside0

PerfMonTablewalkAllocDside1

PerfMonTablewalkAllocDside0

LsMisalAccesses
Core::X86::Pmc::Core::LsMisalAccesses - Misaligned loads

LsPrefInstrDisp
Core::X86::Pmc::Core::LsPrefInstrDisp - Prefetch Instructions
Dispatched

Software Prefetch Instructions Dispatched.

This event has the following units which may be used to modify the
behavior of the event:

PrefetchNTA

StorePrefetchW

LoadPrefetchW
Prefetch, Prefetch_T0_T1_T2

LsInefSwPref
Core::X86::Pmc::Core::LsInefSwPref - Ineffective Software Prefetchs

The number of software prefetches that did not fetch data outside
of the processor core.

This event has the following units which may be used to modify the
behavior of the event:

MabMchCnt
Software PREFETCH instruction saw a match on an already-
allocated miss request buffer.

DataPipeSwPfDcHit
Software PREFETCH instruction saw a DC hit.

LsSwPfDcFills
Core::X86::Pmc::Core::LsSwPfDcFills - Software Prefetch Data Cache
Fills

Software Prefetch Data Cache Fills by Data Source

This event has the following units which may be used to modify the
behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is on
a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX and
the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsHwPfDcFills
Core::X86::Pmc::Core::LsHwPfDcFills - Hardware Prefetch Data Cache
Fills

Hardware Prefetch Data Cache Fills by Data Source

This event has the following units which may be used to modify the
behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is on
a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX and
the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsTwDcFills
Core::X86::Pmc::Core::LsTwDcFills - Table Walker Data Cache Fills
by Data Source

This event has the following units which may be used to modify the
behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is on
a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX and
the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsNotHaltedCyc
Core::X86::Pmc::Core::LsNotHaltedCyc - Cycles not in Halt

IcFw32 Core::X86::Pmc::Core::IcFw32 - 32 Byte Instruction Cache Fetch

The number of 32B fetch windows transferred from IC pipe to DE
instruction decoder (includes non-cacheable and cacheable fill
responses).

IcFw32Miss
Core::X86::Pmc::Core::IcFw32Miss - 32 Byte Instruction Cache Misses

The number of 32B fetch windows tried to read the L1 IC and missed
in the full tag.

IcCacheFillL2
Core::X86::Pmc::Core::IcCacheFillL2 - Instruction Cache Refills
from L2

The number of 64 byte instruction cache line was fulfilled from the
L2 cache.

IcCacheFillSys
Core::X86::Pmc::Core::IcCacheFillSys - Instruction Cache Refills
from System

The number of 64 byte instruction cache line fulfilled from system
memory or another cache.

BpL1TlbMissL2Hit
Core::X86::Pmc::Core::BpL1TlbMissL2Hit - L1 ITLB Miss, L2 ITLB Hit

The number of instruction fetches that miss in the L1 ITLB but hit
in the L2 ITLB.

BpL1TlbMissL2Miss
Core::X86::Pmc::Core::BpL1TlbMissL2Miss - L1 ITLB Miss, L2 ITLB
Miss

The number of instruction fetches that miss in both the L1 and L2
TLBs

IcFetchStall
Core::X86::Pmc::Core::IcFetchStall - Instruction Pipe Stall

This event has the following units which may be used to modify the
behavior of the event:

IcStallAny
Instruction Cache pipeline was stalled during this clock
cycle for any reason.

IcStallDqEmpty
Instruction Cache pipeline was stalled during this clock
cycle due to upstream not providing fetch addresses
quickly.

IcStallBackPressure
Instruction Cache pipeline was stalled during this clock
cycle due to downstream queues being full.

BpL1BTBCorrect
Core::X86::Pmc::Core::BpL1BTBCorrect - L1 BTB Correction

BpL2BTBCorrect
Core::X86::Pmc::Core::BpL2BTBCorrect - L2 BTB Correction

IcCacheInval
Core::X86::Pmc::Core::IcCacheInval - Instruction Cache Lines
Invalidated

The number of instruction cache lines invalidated. A non-SMC event
is CMC (cross modifying code), either from the other thread of the
core or another core.

This event has the following units which may be used to modify the
behavior of the event:

L2InvalidatingProbe
IC line invalidated due to L2 invalidating probe (external
or LS).

FillInvalidated
IC line invalidated due to overwriting fill response.

BpTlbRel
Core::X86::Pmc::Core::BpTlbRel - ITLB Reloads

The number of ITLB reload requests.

IcOcModeSwitch
Core::X86::Pmc::Core::IcOcModeSwitch - OC Mode Switch

This event has the following units which may be used to modify the
behavior of the event:

OcIcModeSwitch
OC to IC mode switch

IcOcModeSwitch
IC to OC mode switch

DeDisDispatchTokenStalls0
Core::X86::Pmc::Core::DeDisDispatchTokenStalls0 - Dynamic Tokens
Dispatch Stall Cycles 0

Cycles where a dispatch group is valid but does not get dispatched
due to a token stall.

This event has the following units which may be used to modify the
behavior of the event:

RetireTokenStall
RETIRE Tokens unavailable

AGSQTokenStall
AGSQ Tokens unavailable

ALUTokenStall
ALU tokens total unavailable

ALSQ3_0_TokenStall

ALSQ3TokenStall
ALSQ 3 Tokens unavailable

ALSQ2TokenStall
ALSQ 2 Tokens unavailable

ALSQ1TokenStall
ALSQ 1 Tokens unavailable

ExRetInstr
Core::X86::Pmc::Core::ExRetInstr - Retired Instructions

ExRetCops
Core::X86::Pmc::Core::ExRetCops - Retired Uops

The number of uOps retired. This includes all processor activity
(instructions, exceptions, interrupts, microcode assists, etc.).
The number of events logged per cycle can vary from 0 to 4.

ExRetBrn
Core::X86::Pmc::Core::ExRetBrn - Retired Branch Instructions

The number of branch instructions retired. This includes all types
of architectural control flow changes, including exceptions and
interrupts.

ExRetBrnMisp
Core::X86::Pmc::Core::ExRetBrnMisp - Retired Branch Instructions
Mispredicted

The number of branch instructions retired, of any type, that were
not correctly predicted. This includes those for which prediction
is not attempted (far control transfers, exceptions and
interrupts).

ExRetBrnTkn
Core::X86::Pmc::Core::ExRetBrnTkn - Retired Taken Branch
Instructions

The number of taken branches that were retired. This includes all
types of architectural control flow changes, including exceptions
and interrupts.

ExRetBrnTknMisp
Core::X86::Pmc::Core::ExRetBrnTknMisp - Retired Taken Branch
Instructions Mispredicted

The number of retired taken branch instructions that were
mispredicted.

ExRetBrnFar
Core::X86::Pmc::Core::ExRetBrnFar - Retired Far Control Transfers

The number of far control transfers retired including far
call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and
interrupts. Far control transfers are not subject to branch
prediction.

ExRetBrnResync
Core::X86::Pmc::Core::ExRetBrnResync - Retired Branch Resyncs

The number of resync branches. These reflect pipeline restarts due
to certain microcode assists and events such as writes to the
active instruction stream, among other things. Each occurrence
reflects a restart penalty similar to a branch mispredict. This is
relatively rare.

ExRetNearRet
Core::X86::Pmc::Core::ExRetNearRet - Retired Near Returns

The number of near return instructions (RET or RET Iw) retired.

ExRetNearRetMispred
Core::X86::Pmc::Core::ExRetNearRetMispred - Retired Near Returns
Mispredicted

The number of near returns retired that were not correctly
predicted by the return address predictor. Each such mispredict
incurs the same penalty as a mispredicted conditional branch
instruction.

ExRetBrnIndMisp
Core::X86::Pmc::Core::ExRetBrnIndMisp - Retired Indirect Branch
Instructions Mispredicted

ExRetMmxFpInstr
Core::X86::Pmc::Core::ExRetMmxFpInstr - Retired MMXTM/FP
Instructions

The number of MMX, SSE or x87 instructions retired. The UnitMask
allows the selection of the individual classes of instructions as
given in the table. Each increment represents one complete
instruction. Since this event includes non- numeric instructions it
is not suitable for measuring MFLOPS.

This event has the following units which may be used to modify the
behavior of the event:

SseInstr
SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41,
SSE42, AVX).

MmxInstr
MMX instructions.

X87Instr
x87 instructions

ExRetCond
Core::X86::Pmc::Core::ExRetCond - Retired Conditional Branch
Instructions

ExDivBusy
Core::X86::Pmc::Core::ExDivBusy - Div Cycles Busy count

ExDivCount
Core::X86::Pmc::Core::ExDivCount - Div Op Count

ExTaggedIbsOps
Core::X86::Pmc::Core::ExTaggedIbsOps - Tagged IBS Ops

This event has the following units which may be used to modify the
behavior of the event:

IbsCountRollover
Number of times an op could not be tagged by IBS because of
a previous tagged op that has not retired.

IbsTaggedOpsRet
Number of Ops tagged by IBS that retired

IbsTaggedOps
Number of Ops tagged by IBS

ExRetFusBrnchInst
Core::X86::Pmc::Core::ExRetFusBrnchInst - Retired Fused Branch
Instructions

The number of fused retired branch instructions retired per cycle.
The number of events logged per cycle can vary from 0 to 3.

L2RequestG1
Core::X86::Pmc::Core::L2RequestG1 - Requests to L2 Group1

This event has the following units which may be used to modify the
behavior of the event:

RdBlkL

RdBlkX

LsRdBlkC_S

CacheableIcRead

ChangeToX

PrefetchL2
Assume core should also count these and allow the breakdown
between H/W vs. S/W and LS vs. IC.

L2HwPf

OtherRequests
Events covered by Core::X86::Pmc::Core::L2RequestG2.

L2RequestG2
Core::X86::Pmc::Core::L2RequestG2 - Requests to L2 Group2

Multi-events in that LS and IF requests can be received
simultaneous.

This event has the following units which may be used to modify the
behavior of the event:

Group1 All Group 1 commands not in unit0.

LsRdSized
RdSized, RdSized32, RdSized64.

LsRdSizedNC
RdSizedNC, RdSized32NC, RdSized64NC.

IcRdSized

IcRdSizedNC

SmcInval

BusLocksOriginator

BusLocksResponses

L2Latancy
Core::X86::Pmc::Core::L2Latancy - L2 Latency

Total cycles spent waiting for L2 fills to complete from L3 or
memory, divided by four. This may be used to calculate average
latency by multiplying this count by four and then dividing by the
total number of L2 fills (unit mask
Core::X86::Pmc::Core::L2RequestG1 == FEh). Event counts are for
both threads. To calculate average latency, the number of fills
from both threads must be used.

This event has the following units which may be used to modify the
behavior of the event:

L2CyclesWaitingOnFills

L2WbcReq
Core::X86::Pmc::Core::L2WbcReq - LS to L2 WBC requests

This event has the following units which may be used to modify the
behavior of the event:

WcbWrite

WcbClose

CacheLineFlush

I_LineFlush

ZeroByteStore
This becomes WriteNoData at SDP; this count does not
include DVM Sync Ops and bus locks which are counted in
Core::X86::Pmc::Core::L2RequestG2.

LocalIcClr
Local IC Clear

CLZero Cache Line Zero

L2CacheReqStat
Core::X86::Pmc::Core::L2CacheReqStat - Core to L2 Cacheable Request
Access Status

This event does not count accesses to the L2 cache by the L2
prefetcher, but it does count accesses by the L1 prefetcher.

This event has the following units which may be used to modify the
behavior of the event:

LsRdBlkCS
LS ReadBlock C/S Hit

LsRdBlkLHitX
LS Read Block L Hit X

LsRdBlkLHitS
LsRdBlkL Hit Shared

LsRdBlkX
LsRdBlkX/ChgToX Hit X. Count RdBlkX finding Shared as a
Miss.

LsRdBlkC
LS Read Block C S L X Change to X Miss

IcFillHitX
IC Fill Hit Exclusive Stale

IcFillHitS
IC Fill Hit Shared

IcFillMiss
IC Fill Miss

L2FillPending
Core::X86::Pmc::Core::L2FillPending - Cycles with fill pending from
L2

Total cycles spent with one or more fill requests in flight from
L2.

This event has the following units which may be used to modify the
behavior of the event:

L2FillBusy.

NAME

DESCRIPTION

SEE ALSO