I haven’t made a post in a few weeks, which is unfortunate. However my excuse this time is a good one, as I broke my right clavicle playing Ultimate Frisbee. It was a failed dive into the end zone, and then a trip directly to the hospital for x-rays. Due to that, life at home and work (once I went back) has been busy to say the least.
Since I’ve been back, I’ve been busy setting up my new backup storage device, based on the SuperMicro SC847 chassis. My plan was to use 18 x 3.0 TB drives in a RAIDZ2 array provided by FreeNAS. This would then be presented to my Backup Exec 2012 server through iSCSI to be used as a deduplication storage device.
During my testing before the purchase everything worked well and seems like a great idea. Even after I received the parts and built the unit, my testing was showing that this was a good solution.
However, the second goal of this unit was to place it across the parking lot in a secondary ‘off-site’ building, connected by two 802.11ac wireless devices. Once this move was completed, performance on that wireless network was shown to not be any better than my existing 802.11a network over the same stretch, and then a drive in the FreeNAS array failed.
Its not a mystery that I’m not a Unix/FreeBSD guru, and the use of ZFS is completely new to me. Because of that I had difficulty troubleshooting and performing the proper steps to take the array out of degraded state (there’s a bit more detail there, it wasn’t simply a failed drive, but its not entirely relevant to this post).
At this point I had to take a step back, and admit that while FreeNAS and ZFS was a good idea in theory, with my department’s experience and standardization on Windows, it was a mistake to implement. There’s a large amount of risk in having a backup solution that isn’t easy to maintain and fix, especially when the person who set it up is out for a week (with a broken bone or something, one of those rare events that never happen right?)
I’m not entirely sure what I’m going to replace FreeNAS with; perhaps utilize the SAS HBA capabilities to build the array with Windows on top, or perhaps Windows Server 2012 storage pools/spaces.
What I’m actually getting at is that its ok to be wrong; its ok to have make a mistake and need to re-design a solution. One person, or even a team of people won’t design perfect infrastructure every time. The important part is to humbly admit it, and then improve. Make something better, and do a lessons-learned/post-mortem to ensure that the right questions are asked the next time a project comes up on the radar.
2 thoughts to “Fallability – physically and at work”