Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Open |
Created at: | 2019-06-24T16:11:39.866Z |
Updated at: | 2021-02-10T23:55:57.026Z |
Created by: | Former user |
---|---|
Reported by: | Former user |
$ thoth debug e54f2b47df798a3f85147935657254d1 ... > ::status debugging crash dump /manta/thoth/stor/thoth/e54f2b47df798a3f85147935657254d1/vmcore.0 (64-bit) from PA5DFM842 operating system: 5.11 joyent_20180816T001857Z (i86pc) git branch: release-20180816 git rev: 005d090d5d39829784302a3c8574b81b88fc69d0 image uuid: (not set) panic message: BAD TRAP: type=d (#gp General protection) rp=fffffcc279a26380 addr=0 dump content: kernel pages only > $C fffffcc279a264e0 mutex_owner_running+0xd() fffffcc279a26520 zfs_znode_free+0x37(fffffe7c62cc7510) fffffcc279a26580 zfs_inactive+0x16e(fffffe7c62cc3800, ffffff913bdd3c90, 0) fffffcc279a265e0 fop_inactive+0x76(fffffe7c62cc3800, ffffff913bdd3c90, 0) fffffcc279a26610 vn_rele+0x8a(fffffe7c62cc3800) fffffcc279a26880 lookuppnvp+0x3ff(fffffcc279a26960, 0, 1, 0, fffffcc279a26bc8, fffffe2779772040, fffffe2779772040, ffffff913bdd3c90) fffffcc279a26920 lookuppnatcred+0x176(fffffcc279a26960, 0, 1, 0, fffffcc279a26bc8, 0, ffffff913bdd3c90) fffffcc279a26a30 lookupnameatcred+0xdd(7fffef0104c0, 0, 1, 0, fffffcc279a26bc8, 0, ffffff913bdd3c90) fffffcc279a26a80 lookupnameat+0x39(7fffef0104c0, 0, 1, 0, fffffcc279a26bc8, 0) fffffcc279a26c30 vn_openat+0x315(7fffef0104c0, 0, 802001, 0, fffffcc279a26d40, 0, 12, 0, 3) fffffcc279a26da0 copen+0x204(ffd19553, 7fffef0104c0, 802001, 0) fffffcc279a26dd0 openat+0x2a(ffd19553, 7fffef0104c0, 800000, 0) fffffcc279a26e40 lx_openat+0x8b(ffffff9c, 7fffef0104c0, 80000, ef070658) fffffcc279a26e70 lx_open+0x25(7fffef0104c0, 80000, 7fffef070658) fffffcc279a26ef0 lx_syscall_enter+0x19b() fffffcc279a26f10 sys_syscall+0x142() >
The alignment of this lock is curious. Somehow znode->z_zfsvfs
is one byte into a 2k allocation.
> fffffe7c62cc7510::print -at znode_t z_zfsvfs->z_znodes_lock fffffe23d267d429 kmutex_t z_zfsvfs->z_znodes_lock = { fffffe23d267d429 void *[1] z_zfsvfs->z_znodes_lock._opaque = [ 0x4000000000000000 ] } > fffffe23d267d429::whatis fffffe23d267d429 is fffffe23d267d000+429, allocated from kmem_alloc_2048 > fffffe7c62cc7510::print znode_t z_zfsvfs z_zfsvfs = 0xfffffe23d267d001 > ::print -a zfsvfs_t z_znodes_lock 428 z_znodes_lock { 428 z_znodes_lock._opaque }
Igor offers a hint:
https://github.com/joyent/illumos-joyent/issues/182#issuecomment-439547843
i saw some similar panics on DilOS with mutex_owner_running() and it was fixed by reverting changes with dependency to libfakekernel - where we can overlap of mutexes
Per @accountid:62431b8f258562006fa2866a in OS-5439: "Note how it has the 0x1 there. That means this is an invalid pointer and in fact one that has likely been touched by the POINTER_INVALIDATE macro."
Unlike OS-5439, we are not looking at the zonepath, we are looking at a shared library:
> $C fffffcc279a264e0 mutex_owner_running+0xd() fffffcc279a26520 zfs_znode_free+0x37(fffffe7c62cc7510) fffffcc279a26580 zfs_inactive+0x16e(fffffe7c62cc3800, ffffff913bdd3c90, 0) fffffcc279a265e0 fop_inactive+0x76(fffffe7c62cc3800, ffffff913bdd3c90, 0) fffffcc279a26610 vn_rele+0x8a(fffffe7c62cc3800) ... > fffffe7c62cc3800::print vnode_t v_path v_path = 0xfffffe2969fa6558 "/zones/87f7df0c-6e05-ed6b-b18a-91560eb21ec9/root/usr/lib64/libkrb5.so.3"
Like in OS-5439, the znode that is being freed is already free.
> fffffe7c62cc7510::whatis fffffe7c62cc7510 is freed from zfs_znode_cache
There are also no other threads that may be actively mucking with this structure.
> ::stacks -m zfs THREAD STATE SOBJ COUNT fffffcc269c92c20 SLEEP CV 3 swtch+0x141 cv_wait+0x70 zthr_procedure+0x61 thread_start+8 fffffcc269d35c20 SLEEP CV 1 swtch+0x141 cv_timedwait_hires+0xec cv_timedwait+0x5c l2arc_feed_thread+0xad thread_start+8 fffffcc26f4c8c20 SLEEP CV 1 swtch+0x141 cv_timedwait_hires+0xec cv_timedwait+0x5c txg_thread_wait+0x5f txg_sync_thread+0x121 thread_start+8 fffffcc269cb5c20 SLEEP CV 1 swtch+0x141 cv_timedwait_hires+0xec dbuf_evict_thread+0xef thread_start+8 fffffcc269c98c20 SLEEP CV 1 swtch+0x141 cv_timedwait_hires+0xec zthr_procedure+0xa3 thread_start+8 fffffcc26d64ac20 SLEEP CV 1 swtch+0x141 cv_wait+0x70 spa_thread+0x1db thread_start+8 fffffcc26d8eec20 SLEEP CV 1 swtch+0x141 cv_wait+0x70 txg_thread_wait+0xaf txg_quiesce_thread+0x126 thread_start+8 fffffe2a59e5cc00 PANIC <NONE> 1 page_ctr_sub_internal+0x65 page_ctr_sub+0x7e page_get_mnode_freelist+0x3ff page_get_freelist+0x16d 0xfffffcc1e60f8aa0 swap_getapage+0x323 htable_getpte+0x94 0x30c72e0 die+0x89 trap+0x1310 cmntrap_pushed+0x3c mutex_owner_running+0xd zfs_znode_free+0x37 zfs_inactive+0x16e fop_inactive+0x76 vn_rele+0x8a lookuppnvp+0x3ff lookuppnatcred+0x176 lookupnameatcred+0xdd lookupnameat+0x39 vn_openat+0x315 copen+0x204 openat+0x2a lx_openat+0x8b lx_open+0x25 lx_syscall_enter+0x19b sys_syscall+0x142 >