OS-8678

Get rid of disabling deep-C-states

Status:
Resolved
Created:
2025-07-14T14:15:05.791-0400
Updated:
2025-07-22T11:34:42.190-0400

Description

This essentially backs out OS-960 .

This email thread: T6648519f61bdb13a-Mf7bb887e6ad4ba283d1000d1 explains things rather well, especially the last comment on the thread: this was trauma from Nehalam & Westmere, and we’re long past those late-2000s models.

Comments (3)

Dan McDonald commented on 2025-07-16T01:03:25.919-0400 (edited 2025-07-16T01:05:29.164-0400):

Tested on Kebecloud CNs which are all Haswell-E (Xeon E v3) machines, and on one Tiger Lake NUC. No degradation in smartos-live builds, nor in any other operations as far as I saw. Willing to run specific tests if need be.

Carlos Neira commented on 2025-07-20T21:38:27.121-0400:

I’m running this on a

 x86 (GenuineIntel 106CA family 6 model 28 step 10 clock 1500 MHz)
      Intel(r) Atom(tm) CPU N550   @ 1.50GHz

There were no problems at all. The machine is used to host several services at home, and no degradation in services was observed when running this change.

According to https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-n400-vol-1-datasheet-.pdf, this CPU supports C1, C2, and C4 states. Running powertop shows the following.

C-states (idle power)   Avg     Residency          P-states (frequencies)
C0 (cpu running)                (0.0%)             1000 Mhz        100.0%
C1                      1.5ms   (100.0%)           1500 Mhz        0.0%

Wakeups-from-idle per second: 3213.9    interval: 5.0s
Power usage (ACPI estimate): 0.000W (running on AC power, fully charged)

Top causes for wakeups:
48.4% (1554.0)                 sched :  <xcalls> unix`dtrace_xcall_func
39.6% (1273.2)              <kernel> :  genunix`clock
 5.6% (181.0)               <kernel> :  genunix`cv_wakeup
 2.0% ( 63.6)               <kernel> :  c2audit`au_queue_kick
 2.0% ( 63.6)               <kernel> :  SDC`sysdc_update
 0.6% ( 19.8)               <kernel> :  uhci`uhci_handle_root_hub_status_change
 0.3% ( 11.0)               <kernel> :  genunix`delay_wakeup
 0.2% (  5.0)               <kernel> :  uhci`uhci_cmd_timeout_hdlr
 0.2% (  5.0)               <kernel> :  genunix`schedpaging
 0.2% (  5.0)               <kernel> :  ehci`ehci_handle_root_hub_status_change
 0.2% (  5.0)               <kernel> :  cpudrv`cpudrv_monitor_disp
 0.1% (  4.8)                  sched :  <xcalls> unix`speedstep_pstate_transition
 0.1% (  4.0)            <interrupt> :  atge#0
 0.1% (  2.6)               <kernel> :  ipf`fr_slowtimer
 0.1% (  2.6)               <kernel> :  genunix`clock_realtime_fire
 0.0% (  1.4)               <kernel> :  FSS`fss_update
 0.0% (  1.2)               <kernel> :  acpi_drv`acpi_drv_cbat_rescan
 0.0% (  1.2)            <interrupt> :  ehci#0
 0.0% (  1.2)               <kernel> :  TS`ts_update
 0.0% (  1.2)            <interrupt> :  uhci#1
 0.0% (  1.2)            <interrupt> :  uhci#0
 0.0% (  1.2)            <interrupt> :  uhci#3
 0.0% (  1.2)            <interrupt> :  uhci#2
 0.0% (  1.0)               <kernel> :  genunix`kmem_update
 0.0% (  0.8)               <kernel> :  sd`sd_pm_idletimeout_handler
 0.0% (  0.6)               <kernel> :  ip`mld_slowtimo
 0.0% (  0.4)               <kernel> :  ip`igmp_slowtimo
 0.0% (  0.2)               <kernel> :  kcf`rnd_handler
 0.0% (  0.2)                  sched :  <xcalls> unix`hati_demap_func
 0.0% (  0.2)            <interrupt> :  ahci#0
 0.0% (  0.2)               <kernel> :  ahci`ahci_watchdog_handler
 0.0% (  0.2)               <kernel> :  genunix`vmem_update
 0.0% (  0.2)               <kernel> :  swrand`rnd_handler

On an AMD system running this change gives the following (C2 state was not present without this change)

CPU

x86 (AuthenticAMD A70F52 family 25 model 117 step 2 clock 4100 MHz)
      AMD Ryzen 7 8700F 8-Core Processor        [ Socket: AM5 ]

PowerTop with change:

C-states (idle power)   Avg     Residency                                                                                                   P-states (frequencies)
C0 (cpu running)                (13.5%)                                                                                                     1600 Mhz        87.5%
C1                      0.0ms   (6.7%)                                                                                                      2200 Mhz        0.0%
C2                      0.9ms   (79.8%)                                                                                                     4100 Mhz(turbo) 12.5%

Wakeups-from-idle per second: 286610.7  interval: 5.0s
no ACPI power usage estimate available

Top causes for wakeups:
 5.3% (15239.9)                sched :  <xcalls> unix`dtrace_xcall_func
 0.3% (1000.6)              <kernel> :  genunix`clock
 0.0% (122.4)               <kernel> :  genunix`cv_wakeup
 0.0% ( 50.0)               <kernel> :  c2audit`au_queue_kick
 0.0% ( 50.0)               <kernel> :  SDC`sysdc_update
 0.0% ( 27.0)            <interrupt> :  nvme#1
 0.0% ( 27.0)            <interrupt> :  nvme#0
 0.0% ( 16.0)               <kernel> :  cpudrv`cpudrv_monitor_disp
 0.0% (  4.0)               <kernel> :  ipf`fr_slowtimer
 0.0% (  4.0)               <kernel> :  genunix`schedpaging
 0.0% (  4.0)               <kernel> :  igb`igb_local_timer
 0.0% (  3.2)               <kernel> :  genunix`clock_realtime_fire
 0.0% (  2.0)            <interrupt> :  igb#0
 0.0% (  1.0)               <kernel> :  FSS`fss_update
 0.0% (  1.0)               <kernel> :  acpi_drv`acpi_drv_cbat_rescan
 0.0% (  1.0)               <kernel> :  unix`memscrub_wakeup
 0.0% (  1.0)               <kernel> :  TS`ts_update
 0.0% (  0.6)               <kernel> :  genunix`kmem_update
 0.0% (  0.6)            <interrupt> :  igb#1
 0.0% (  0.4)               <kernel> :  ahci`ahci_watchdog_handler
 0.0% (  0.4)               <kernel> :  ip`mld_slowtimo
 0.0% (  0.2)               <kernel> :  ip`tcp_timer_callback
 0.0% (  0.2)               <kernel> :  kcf`rnd_handler

PowerTop without change

C-states (idle power)   Avg     Residency                                                                                                   P-states (frequencies)
C0 (cpu running)                (58.8%)                                                                                                     1600 Mhz        87.8%
C1                      0.0ms   (41.2%)                                                                                                     2200 Mhz        0.0%
                                                                                                                                            4100 Mhz(turbo) 12.2%

Wakeups-from-idle per second: 389093.0  interval: 0.1s
no ACPI power usage estimate available

Top causes for wakeups:
 3.8% (14803.5)                sched :  <xcalls> unix`dtrace_xcall_func
 0.3% (1035.4)              <kernel> :  genunix`clock
 0.0% (140.3)            <interrupt> :  igb#0
 0.0% ( 73.5)               <kernel> :  genunix`cv_wakeup
 0.0% ( 53.4)               <kernel> :  SDC`sysdc_update
 0.0% ( 46.8)               <kernel> :  c2audit`au_queue_kick
 0.0% ( 13.4)               <kernel> :  ip`tcp_timer_callback
 0.0% ( 13.4)               <kernel> :  genunix`clock_realtime_fire
 0.0% (  6.7)               <kernel> :  FSS`fss_update
 0.0% (  6.7)               <kernel> :  genunix`schedpaging
 0.0% (  6.7)               <kernel> :  ipf`fr_slowtimer

Dan McDonald commented on 2025-07-21T10:55:30.313-0400:

From the community’s Jop Zinkweg via the TritonDataCenter discord’s #smartos channel:

• Jop — 5:35 AM

We've been running @danmcd's custom image with the removed
idle_cpu_no_deep_c setting in production without issues on
a variety of hardware, including AMD EPYC 7351, Intel Xeon
4114/4216/6230/4310 and a few ancient E5-2620's that are
still used for our headnodes. If it's useful data I can add
it to the pull request.

Thanks Jop for the useful data.

Related Links