Issue Type: | Bug |
---|---|
Priority: | 2 - Critical |
Status: | Resolved |
Created at: | 2017-10-09T00:51:41.000Z |
Updated at: | 2018-02-12T18:57:41.810Z |
Created by: | Former user |
---|---|
Reported by: | Former user |
Assigned to: | Former user |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-10-27T18:24:26.000Z)
2017-11-09 Edge (Release Date: 2017-11-09)
We deployed a new Manatee image which included a patch to PostgreSQL 9.6.3 to use ISM instead of POSIX shared memory. Most shards in the DC were fine, but shard 1 (SPC ap-southeast) was decidedly not.
The machine on which the primary for the shard is deployed (HA7S8HND2) has a high load average (~40+) without any processes (seen via prstat
) obviously responsible. Looking at a 5 second mpstat
sample:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 590 0 265 2372 153 225 6 54 50 0 449 2 74 0 24 1 782 0 89 163 4 56 2 17 52 0 178 1 85 0 14 2 1215 0 265 375 82 368 20 76 148 0 1147 4 66 0 30 3 313 0 52 256 40 190 8 44 18 0 357 1 75 0 23 4 811 0 168 272 39 207 15 55 44 0 556 1 83 0 16 5 625 0 98 171 7 38 2 10 20 0 120 0 89 0 10 6 277 0 87 255 34 167 4 34 274 0 272 1 74 0 25 7 979 0 157 286 58 190 4 32 125 0 591 2 76 0 23 8 469 0 82 227 27 112 1 19 61 0 336 1 80 0 19 9 413 0 45 245 55 98 2 12 29 0 257 1 79 0 20 10 471 0 50 224 28 98 2 21 70 0 243 1 79 0 20 11 899 0 195 254 46 220 4 49 125 0 425 1 88 0 11 12 555 0 264 312 67 280 7 69 70 0 363 1 75 0 24 13 1592 0 214 294 62 191 17 29 45 0 706 2 75 0 23 14 669 0 175 630 376 298 21 49 102 0 329 1 74 0 25 15 1035 0 166 544 337 140 16 23 101 0 297 1 77 0 22 16 330 0 73 562 349 199 5 30 150 0 120 0 79 0 20 17 167 0 94 456 272 112 2 15 15 0 92 0 86 0 14 18 1208 0 241 744 389 630 22 100 154 0 1157 3 61 0 36 19 380 0 50 474 277 145 6 19 21 0 123 0 76 0 23 20 776 0 225 610 368 160 30 24 90 0 340 2 87 0 11 21 469 0 68 292 111 123 3 14 12 0 119 0 78 0 22 22 437 0 108 388 139 260 15 50 47 0 690 3 69 0 29 23 313 0 39 296 116 46 2 13 13 0 113 0 85 0 14 24 470 0 44 282 107 53 1 17 40 0 267 1 82 0 17 25 643 0 91 291 67 309 4 45 128 0 691 2 62 0 37 26 961 0 282 262 39 261 7 59 92 0 931 4 56 0 40 27 1103 0 143 424 123 508 7 79 159 0 1005 2 56 0 41 28 691 0 115 503 286 192 36 51 64 0 560 3 63 0 34 29 436 0 90 359 156 229 4 29 33 0 374 1 64 0 34 30 598 0 180 361 139 208 8 53 151 0 477 2 64 0 35 31 953 0 131 291 117 141 3 27 52 0 368 1 71 0 28 32 795 0 249 277 38 306 11 60 138 0 658 3 59 0 38 33 745 0 185 422 161 283 16 53 71 0 680 3 57 0 41 34 853 0 172 257 33 171 17 32 38 0 441 2 72 0 26 35 621 0 83 213 25 177 4 34 190 0 319 1 60 0 39 36 533 0 98 225 25 101 2 14 36 0 329 1 78 0 20 37 776 0 80 205 18 132 7 27 45 0 237 1 66 0 34 38 349 0 62 297 55 162 1 16 14 0 195 0 80 0 20 39 341 0 44 263 38 230 4 48 81 0 153 0 63 0 36 40 469 0 85 190 11 60 9 19 9 0 400 1 89 0 10 41 809 0 101 423 120 397 14 51 22 0 1284 4 58 0 38 42 870 0 131 188 13 69 3 17 65 0 319 1 75 0 25 43 687 0 193 290 63 281 2 48 53 0 453 1 54 0 45 44 591 0 105 241 43 116 1 12 23 0 195 0 79 0 21 45 527 0 171 267 45 208 6 43 37 0 639 2 69 0 30 46 956 0 188 384 146 292 8 64 115 0 734 3 55 0 42 47 647 0 95 213 34 181 2 35 63 0 464 1 48 0 51 48 889 0 221 342 77 298 18 59 64 0 627 1 74 0 24 49 156 0 75 296 81 253 2 34 34 0 313 1 61 0 39 50 653 0 106 234 34 132 6 26 19 0 321 1 76 0 23 51 284 0 32 190 22 96 1 23 27 0 97 0 65 0 35 52 681 0 162 352 87 322 14 54 57 0 818 3 62 0 35 53 873 0 160 418 122 454 6 40 194 0 603 2 62 0 36 54 529 0 131 205 11 84 5 21 143 0 192 0 84 0 16 55 844 0 189 489 149 595 19 79 98 0 962 2 32 0 66
In a 60 second kernel profiling, the most common stack was:
unix`hati_demap_func+0x4c unix`hat_tlb_inval_range+0x1fe unix`handle_ranges+0x7c unix`hat_unload_callback+0x205 unix`hat_unload+0x3e unix`hat_unshare+0x147 genunix`segspt_shmunmap+0xa7 genunix`as_unmap+0x19c shmsys`shm_detach+0x5b shmsys`shmexit+0x63 genunix`relvm+0x19e genunix`proc_exit+0x4aa genunix`exit+0x15 genunix`rexit+0x18 unix`sys_syscall+0x19f 365779
I took a look at shmexit()
activity on the machine:
[root@HA7S8HND2 (ap-southeast-1a) /var/tmp]# dtrace -n 'fbt::shmexit:entry { self->t = timestamp; } fbt::shmexit:return /self->t/ { @[zonename, execname] = quantize(timestamp - self->t); self->t = 0; } tick-60s { printa(@); trunc(@); }' dtrace: description 'fbt::shmexit:entry ' matched 3 probes CPU ID FUNCTION:NAME 8 85774 :tick-60s global zabbix_agentd value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@ 9 8192 |@@@@@@@@ 3 16384 |@@@@@@@@@@ 4 32768 | 0 7f6d4352-c2d3-4441-8e72-37741d55d545 postgres value ------------- Distribution ------------- count 268435456 | 0 536870912 |@@@@@@@@@@@@@@@@@@@@@@@ 1070 1073741824 |@@@@@@@@@@@@@@@@@ 825 2147483648 | 0
Note that there are many PostgreSQL processes exiting, and each of them takes 500-1000ms in shmexit()
.
Looking at the PostgreSQL log to see the source of the connections that are disconnecting, causing postgres
to exit:
[root@7f6d4352 (postgres) ~]$ tail -f /var/pg/postgresql.log | awk '/disconnection:/ { print($(NF-1)); t++ } t > 250 { exit(0) }' | sort | uniq -c | sort -n 71 host=10.160.0.82 80 host=10.164.0.216 100 host=10.168.0.82
These are the Moray instances for this shard; one per DC. Looking at one of the Moray instances, I see a lot of this in /var/log/moray.log
:
00:29:37.901Z FATAL moray/pgpool: unable to create PG client Error: Connection terminated unexpectedly at null.<anonymous> (/opt/smartdc/moray/node_modules/pg/lib/client.js:224:28) at g (events.js:180:16) at emit (events.js:92:17) at Socket.<anonymous> (/opt/smartdc/moray/node_modules/pg/lib/connection.js:90:10) at Socket.emit (events.js:95:17) at TCP.close (net.js:466:12)
It's not clear if this is cause or effect. The PostgreSQL log has lots of odd messages like these:
2017-10-09 00:30:04 UTC [27248]: [3-1] user=moray,db=moray,app=[unknown],client=10.160.0.82 LOG: could not receive data from client: Connection reset by peer ... 2017-10-09 00:30:17 UTC [18277]: [180-1] user=,db=,app=,client= LOG: could not accept new connection: Software caused connection abort ... 2017-10-09 00:30:18 UTC [27837]: [2-1] user=[unknown],db=[unknown],app=[unknown],client=10.160.0.82 LOG: incomplete startup packet
Looking at one particular short-lived connection:
[root@7f6d4352 (postgres) ~]$ grep -w 31906 !$ grep -w 31906 /var/pg/postgresql.log 2017-10-09 00:32:03 UTC [31906]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=10.160.0.82 LOG: connection received: host=10.160.0.82 port=32891 2017-10-09 00:32:03 UTC [31906]: [2-1] user=moray,db=moray,app=[unknown],client=10.160.0.82 LOG: connection authorized: user=moray database=moray 2017-10-09 00:32:03 UTC [31906]: [3-1] user=moray,db=moray,app=[unknown],client=10.160.0.82 LOG: could not receive data from client: Connection reset by peer 2017-10-09 00:32:03 UTC [31906]: [4-1] user=moray,db=moray,app=[unknown],client=10.160.0.82 LOG: disconnection: session time: 0:00:00.005 user=moray database=moray host=10.160.0.82 port=32891
It is possible that if the PostgreSQL machine is overloaded by shmexit
calls, it is unable to even fully accept each Moray connection until after Moray's perceived (and presumably hair-trigger) connection timeout has already expired.
This sounds similar to the symptoms observed under MORAY-339 (though I have not seen the shmexit symptom before). PostgreSQL does a fair bit of work synchronously when accepting connections, and because Moray caps its connect timeout at 4 seconds, we've seen cases where modest delays in certain operations can bubble upstack to create this sort of mess. For example, if fork(2) starts to take even just one second, and Moray tries to create just four connections (across all processes in all zones for this shard), it's likely the fourth one will time out. If Morays try to create more than four connections (which was far more common prior to MORAY-397), this only gets worse as the aborted connections stack up in the accept queue and effectively guarantee that new connection attempts will time out. (It seems that the way Moray aborts these connections, the connection remains on the PostgreSQL accept queue.) Some of these cases were attributed to DTrace-related fork latency (e.g., OS-4742), but it's conceivable that shmexit() could get in the way as well.
In investigating this, we were spending all of our time in TLB shootdown; time spent in hati_demap_func
, by length:
root@HA7S8HND2 (ap-southeast-1a) ~]# dtrace -n hati_demap_func:entry'{self->in = timestamp; self->len = arg2}' -n hati_demap_func:return'/self->in/{@[self->len] = quantize(timestamp - self->in); self->in = 0; self->len = 0}' dtrace: description 'hati_demap_func:entry' matched 1 probe dtrace: description 'hati_demap_func:return' matched 1 probe dtrace: 156 dynamic variable drops with non-empty dirty list dtrace: 415 dynamic variable drops with non-empty dirty list ^C 135168 value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 4096 | 0 0 value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 4096 | 0 49152 value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4 2048 | 0 61440 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8192 | 0 233472 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8192 | 0 368640 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8192 | 0 53248 value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6 2048 | 0 122880 value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 4096 | 0 65536 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 8192 | 0 163840 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16384 | 0 598016 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16384 | 0 167936 value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@ 1 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@ 1 16384 | 0 229376 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 32768 | 0 466944 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 32768 | 0 561152 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 32768 | 0 1015808 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 32768 | 0 1044480 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 32768 | 0 221184 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5 8192 | 0 102400 value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9 4096 |@@@@ 1 8192 | 0 794624 value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 65536 | 0 1003520 value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 65536 | 0 299008 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5 8192 | 0 16384 |@@@@@@@ 1 32768 | 0 393216 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5 8192 | 0 16384 |@@@@@@@ 1 32768 | 0 36864 value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 2048 |@@ 1 4096 |@@@@@@ 3 8192 |@@@@@@ 3 16384 | 0 2064384 value ------------- Distribution ------------- count 32768 | 0 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 131072 | 0 864256 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@ 2 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@ 2 65536 | 0 1048576 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@ 1 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 65536 | 0 45056 value ------------- Distribution ------------- count 512 | 0 1024 |@@ 3 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 42 4096 |@@@@@@@ 9 8192 | 0 2097152 value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@ 2 65536 |@@@@@@@@@@@@@@@@@@@@ 2 131072 | 0 20480 value ------------- Distribution ------------- count 256 | 0 512 | 2 1024 |@@@@@@@ 96 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@ 340 4096 |@@@@@@@@ 106 8192 | 4 16384 | 1 32768 | 0 696320 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@ 78 16384 |@@@@@@@ 26 32768 |@@@@@@@@@@@@ 43 65536 | 0 7090176 value ------------- Distribution ------------- count 32768 | 0 65536 |@@@@@@@@ 66 131072 |@@@@@@@@ 66 262144 |@@@@@@@@@@@@@@@@@@@@@@@@ 199 524288 | 0 4096 value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@@@@ 58044 1024 |@@@@@@@@@@@@@@ 53784 2048 |@@@@@@@@@ 36784 4096 |@ 5338 8192 | 1737 16384 | 112 32768 | 63 65536 | 1 131072 | 0 874512384 value ------------- Distribution ------------- count 4194304 | 0 8388608 |@@@@@@@@@@@@@@ 64 16777216 |@@@@@ 25 33554432 |@@@@@@@@@@@@@@@@@@@@@ 95 67108864 | 0 34359738368 value ------------- Distribution ------------- count 268435456 | 0 536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 93 1073741824 |@@@@@@@@@@@@@ 45 2147483648 | 0
This is despite the fact that the ISM segment is on large pages; from pmap -s
output of one of the postgres
processes:
FFFFBF7780000000 33554432K 1G rwxsR [ ism shmid=0x4 ] FFFFBF7F80000000 854016K 2M rwxsR [ ism shmid=0x4 ] FFFFBF7FB4200000 536K 4K rwxsR [ ism shmid=0x4 ]
The issue here is that hati_demap_func
does not take pagesize into account -- which means for the 32G segment that consists of 32 1G pages, we are performing over 8 million invlpg
instructions where we should do only 32! (It is not surprising that these instructions take ~100 ns as this is not an instruction designed to be called millions of times per second.) As an experiment, we forced hati_demap_func
to always just invalidate the entire TLB (regardless of length):
[root@HA7S8HND2 (ap-southeast-1a) ~]# mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd mm lofs idm mpt_sas sata crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ] > hati_demap_func::dis hati_demap_func: pushq %rbp hati_demap_func+1: movq %rsp,%rbp hati_demap_func+4: subq $0x20,%rsp hati_demap_func+8: movq %rdx,-0x18(%rbp) hati_demap_func+0xc: movq %rsi,-0x10(%rbp) hati_demap_func+0x10: movq %rdi,-0x8(%rbp) hati_demap_func+0x14: cmpq %rdi,+0x4181f5(%rip) <kas+0x10> hati_demap_func+0x1b: je +0x1b <hati_demap_func+0x38> hati_demap_func+0x1d: movq %gs:0x10,%rax hati_demap_func+0x26: cmpq %rdi,0x660(%rax) hati_demap_func+0x2d: je +0x9 <hati_demap_func+0x38> hati_demap_func+0x2f: xorl %eax,%eax hati_demap_func+0x31: leave hati_demap_func+0x32: ret hati_demap_func+0x33: nopl 0x0(%rax,%rax) hati_demap_func+0x38: cmpq $-0x1,%rsi <0xffffffffffffffff> hati_demap_func+0x3c: je +0x22 <hati_demap_func+0x60> hati_demap_func+0x3e: testq %rdx,%rdx hati_demap_func+0x41: je -0x14 <hati_demap_func+0x2f> hati_demap_func+0x43: xorl %eax,%eax hati_demap_func+0x45: nopl (%rax) hati_demap_func+0x48: invlpg (%rax,%rsi) hati_demap_func+0x4c: addq $0x1000,%rax hati_demap_func+0x52: cmpq %rax,%rdx hati_demap_func+0x55: ja -0xf <hati_demap_func+0x48> hati_demap_func+0x57: xorl %eax,%eax hati_demap_func+0x59: leave hati_demap_func+0x5a: ret hati_demap_func+0x5b: nopl 0x0(%rax,%rax) hati_demap_func+0x60: testb $0x2,0x58(%rdi) hati_demap_func+0x64: je +0x3f <hati_demap_func+0xa5> hati_demap_func+0x66: movq %gs:0x10,%rax hati_demap_func+0x6f: movq 0x668(%rax),%rax hati_demap_func+0x76: movq 0x90(%rdi),%rdx hati_demap_func+0x7d: movq 0x18(%rax),%rax hati_demap_func+0x81: movq %rdx,(%rax) hati_demap_func+0x84: movq 0x98(%rdi),%rdx hati_demap_func+0x8b: movq %rdx,0x8(%rax) hati_demap_func+0x8f: movq 0xa0(%rdi),%rdx hati_demap_func+0x96: movq %rdx,0x10(%rax) hati_demap_func+0x9a: movq 0xa8(%rdi),%rdx hati_demap_func+0xa1: movq %rdx,0x18(%rax) hati_demap_func+0xa5: movq %cr3,%rax hati_demap_func+0xa8: movq %rax,%cr3 > > hati_demap_func+0x3c/Bna hati_demap_func+0x3c: 74 hati_demap_func+0x3d: > hati_demap_func+0x3c/v eb mdb: failed to write eb at address 0xfffffffffb8214dc: target is not open for writing > $W > hati_demap_func+0x3c/v eb hati_demap_func+0x3c: 0x74 = 0xeb
This resolved the issue as immediately seen -- but some number of minutes later, the machine reset for reasons not understood (but may well be related to the in-place modification). Based on what we saw before the reset, we can conclude that -- at very least -- this is an exacerbating condition whereby an initial perturbation could result in the system in a state from which it couldn't recover. The source of any initial perturbation as well as the likelihood of this is not yet determined -- but in principle this was possible on 9.2, albeit less likely due to the smaller buffer size.
To propagate the page size down to hati_demap_func
I have slightly changed the behavior of hat_tlb_inval_range
to accept a range_info_t
, instead of an address and length. From the information in a range_info_t
page size can be directly calculated.
Here is the CR: https://cr.joyent.us/#/c/2783/
hati_demap_func
run time. tlbshoot.c
creates a 32GB sysv shared memory segment, hopefully made up of 1GB pages, and then attaches and detaches from that segment 1000 times. Each of those detaches will cause a call to hati_demap_func
.To measure this I have also written a D script to show what range is requested and after a ^C, a histogram of how long the subsequent hati_demap_func
calls took in nanoseconds.tlbshoot.c can be found here: https://gist.github.com/gwydirsam/6f25309b720350b6883dc2cd1f398349
The D script is a bit shorter:
#!/usr/sbin/dtrace -s #pragma D option quiet fbt::hat_tlb_inval_range:entry /execname == "tlbshoot-1GB"/ { printf("hat_tlb_inval_range: range: cnt:%d lvl:%d\n", args[1]->rng_cnt, args[1]->rng_level); } fbt::hati_demap_func:entry /execname == "tlbshoot-1GB"/ { self->ts = timestamp; printf("hati_demap_func: len:%u\n", args[2]); } fbt::hati_demap_func:return /self->ts/ { @[execname, "ns"] = quantize(timestamp - self->ts); self->ts = 0; }
These results are very promising. Before the fix we can see hati_demap_func
always took ~.26 seconds to run.
tlbshoot-1GB ns value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@ 3016 2048 | 6 4096 | 8 8192 | 1 16384 | 1 32768 | 0 65536 | 0 131072 | 0 262144 | 0 524288 | 1 1048576 | 0 2097152 | 0 4194304 | 0 8388608 |@@@@@@@@ 1000 16777216 | 0 33554432 | 0 67108864 | 0 134217728 | 0 268435456 |@@@@@@@@ 1000 536870912 | 0
Afterwards we can see that hati_demap_func
now takes ~16us!
tlbshoot-1GB ns value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@ 3028 2048 |@@@@@@@@ 1001 4096 | 3 8192 | 1 16384 |@@@@@@@@ 999 32768 | 4 65536 | 1 131072 | 0 262144 | 0 524288 | 1 1048576 | 0
illumos-joyent commit 9b585df (branch master, by Sam Gwydir)
OS-6390 hati_demap_func should take pagesize into account
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Tim Kordas <tim.kordas@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
A small side-note: this change isn't quite right for the HAT_SHARED case, where we change "va" but not the range_info_t passed into the cross-call. This doesn't seem to have actually hurt, though, and besides I've indirectly fixed it as part of OS-6546