After successfully completing a ‘zfs replace’ I was not so pleased to get the following error message back from ‘zfs detach’:
cannot detach c5t17d0: no valid replicas
I decided that I would upgrade this OpenSolaris 2008.11 instance to OpenSolaris 2009.06 in order to see if the obvious bug that I was encountering was resolved in the newest version. Since upgrading in OpenSolaris supports automatic boot environment creation, there really is not much danger at all in updating because you can always boot back into the other environment at any time.
The upgrade was a success, and after I booted into 2009.06 I was able to simply detach the failed drive from the pool and thus remove it from the system.
I recompiled gluster and I ran 2009.06 for a couple of days, until I started noticing that the server was rebooting during times of high I/O. A peek inside ‘/var/adm/messages’ revealed the following errors:
Aug 15 22:33:04 cybertron ^Mpanic[cpu0]/thread=ffffff091060c900:
Aug 15 22:33:04 cybertron genunix: [ID 783603 kern.notice] Deadlock: cycle in blocking chain
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9651f0 genunix:turnstile_block+795 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965250 unix:mutex_vector_enter+261 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9652f0 zfs:zfs_zget+be ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965380 zfs:zfs_zaccess+7c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965400 zfs:zfs_lookup+333 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9654a0 genunix:fop_lookup+ed ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965550 genunix:xattr_dir_realdir+8b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655a0 genunix:xattr_dir_realvp+5e ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655f0 genunix:fop_realvp+32 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965640 genunix:vn_compare+31 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965860 genunix:lookuppnvp+94c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965900 genunix:lookuppnatcred+11b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965990 genunix:lookuppnat+69 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965b30 genunix:vn_createat+13a ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965cf0 genunix:vn_openat+1fb ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e50 genunix:copen+435 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e80 genunix:openat64+25 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965ec0 genunix:fsat32+f5 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965f10 unix:brand_sys_sysenter+1e0 ()
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 672855 kern.notice] syncing file systems…
Aug 15 22:33:04 cybertron genunix: [ID 904073 kern.notice] done
My efforts to find any further detailes about this bug are ongoing, so at this point I have booted back into 2008.11 and I will be running that until a fix or a workaround is found.