OS-7124: ::xcall would be useful

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-08-14T09:53:58.785Z
Updated at:2019-09-04T09:42:51.843Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-08-17T15:14:08.142Z)

Fix Versions

2018-08-30 Zolom Swamp (Release Date: 2018-08-30)

Description

Many times recently, we've had a crash dump where all the CPUs get stuck. Often this is most easily visible as xcall state, where a master CPU is stuck in xc_serv() waiting for one or more slave CPUs to respond.

It's somewhat painful to observe xcall state, so to make state clearer, an ::xcall dcmd would be useful. This will collate all active xc_msg structs under the relevant master CPU, making it clear for each master what other CPUs it is waiting for and why.

Currently, a CPU that's processing a message takes it off the >xc_msgbox queue, and it's hence not (easily) visible in the dump state. To make this clearer, when we start to process a message, we'll place it in an >xc_curmsg holding cell, so it's easy for ::xcall to find it.

Comments

Comment by Former user
Created at 2018-08-14T10:07:59.654Z

We also need a couple of mdb_ctf.c fixed: first, re-introduce the logic that allowed a consumer to optionally ignore missing members in mdb_ctf_vread() - so we fall back if a dump is missing ->xc_curmsg. Also, don't use UM_GC in that same routine: in a loop, this can easily exhaust KMDB's available space.


Comment by Former user
Created at 2018-08-14T10:29:05.044Z

Here's an example output:

[0]> 0t37::xcall
CPU PEND HANDLER
 37   47 hati_demap_func(0xfffffe2308c763f8, 0xfffffcc26ae974a0, 0)
         COMMAND   SLAVE 
         CALL      0     
         CALL      1     
         CALL      2     
         CALL      3     
         CALL      4     
         CALL      5     
         CALL      7     
         CALL      8     
         CALL      9     
         CALL      10    
         CALL      11    
         CALL      12    
         *CALL     13    
         CALL      14    
         CALL      16    
         *CALL     17    
         *CALL     19    
         CALL      20    
         *CALL     21    
         CALL      23    
         *CALL     24                 
         *CALL     25                 
         CALL      28                 
         CALL      29                 
         CALL      30                 
         CALL      32                 
         CALL      34                 
         CALL      35                 
         CALL      36                 
         CALL      37                 
         CALL      38                 
         *CALL     39                 
         CALL      40                 
         *CALL     41                 
         *CALL     42                 
         CALL      44                 
         *CALL     45                 
         *CALL     47                 
         *CALL     48                 
         *CALL     49                 
         *CALL     51                 
         CALL      52                 
         CALL      54                 
         CALL      55                 
[0]> cpu::print [0t13] | ::cpustack
hati_demap_func()
apix`apix_dispatch_by_vector+0x8c(f1)
apix`apix_dispatch_hilevel+0x15(f1, 0)
...

Comment by Jira Bot
Created at 2018-08-17T14:55:50.868Z

illumos-joyent commit c6f905766eb40c4295246520607032b0d63fe48f (branch master, by John Levon)

OS-7124 ::xcall would be useful
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>