Core Switch upgrade

PowerConnect 5548Last night I performed a network upgrade, replacing an old and potentially failing 3Com 3848, with a Dell PowerConnect 5548.

 

The network design that my head office has leaves a lot to be desired, but unfortunately I don’t have the resources to do a complete overhaul. Right now we’ve got a single core switch, which is linked to our Sonicwall NSA 2400 by one switch port.
Every other access switch, and all servers directly connect to this core switch. Almost everything is using 2 port Link Aggregation when it makes it’s connection to the core. There’s zero daisy chaining of switches, and since everything connects directly to the core, I’ve got STP off on the core. Our access switches are mostly PowerConnect 2724’s anyways, so they don’t support STP.

This is a very simple and effective network layout, but it’s far from redundant. If that core switch dies, our entire network is down, including every service we provide to external clients.

My future goal is to replace the 2724’s with 2824’s that do support STP, and then instead of LAG groups to a single switch, I’ll use two uplinks to two separate core switches. The only thing that is currently beyond my knowledge is how to link those two core switches to our Sonicwall redundantly without creating a network loop or routing issues.

 

Replacements like this are one of the high points of my job. It seems simple, basic and low on the skill requirements, but there’s something about meticulously planning and then implementing hardware that is very relaxing, very enjoyable. I don’t know if others share this sentiment, but racking things is definitely a job perk that I wouldn’t want to give up.

MD3220i firmware update

MD3220i There isn’t too much out on the interwebs about the MD3220i, so I thought I’d share my experience with updating the firmware.

Before the Christmas break Dell called and mentioned that a very important firmware update had been released, and it should be done as soon as possible. Due to scheduling, last week was when I scheduled a maintenance window to do it.

I was previously on firmware 10.70, so to fully update to 10.80 I need to use the bridge firmware to get to 10.75 first.

Luckily I realized that I needed to update the “PowerVault Modular Disk Storage Manager Client” before doing the firmware update. I uninstalled the previous version, and then reinstalled from this package:

ftp://ftp.dell.com/FOLDER88591M/1/DELL_MDSS_Consolidated_RDVD_3_0_0_18_A00_R314542.iso

However, installing from the setup.exe in that ISO, I had lost the actual “PowerVault Modular Disk Storage Manager Client”; no where to be found.

Eventually I figured out that from the install source, you also need to go into one of the sub directories, and run a different executable to get the MDSM Client.

Of course, I started drafting this post while I had that install source, and now I realize I must have deleted it so I can’t give the exact path of the executable that is needed. If I download the package again, I’ll make sure I update this post with the right path.

 

Once I had the MDSM updated, I updated to the bridge firmware first, and then the 10.8 firmware second. During this process the MDSM gave specific information about which controller it was updating, and all of my resources running from the MD3220i went uninterrupted.

Firmware updates are always a little stressful, even more so when the company SAN is the one undergoing an update; this one was smooth and painless which I was very glad for.

My Hyper-V cluster logs went wild though, reporting failures of the reach-ability of all hosts. Nothing to worry about since there’s dual controllers, but something to be aware of.

 

Recent performance testing and results

I’Intel i7ve just completed some performance testing for an OrthoPhotography component within PCI Geomatica and thought I’d share the results. The results aren’t very surprising based on past experiences, but a little disappointing.

After repeated discussions, we purchase the following system from Dell specifically for this purpose:

Precision T5500

2 x Intel Xeon E5607 2.27 GHz
24 GB (6x4GB – 1600 MHz DDR3)
2 x 256 SSD – RAID1
2 x 1.5 TB SATA – RAID 1
2 x nVidia Quadro 2000 1GB

Total Cost: $5700

We insisted on benchmarking this desktop before it was put to production use, because we didn’t think it would perform as well as the manager thought. This desktop was considered as our baseline.

Using OrthoEngine with a 4.5GB .pix file, our existing computers (Pentium 4, 2006 era) were taking over 5 hours to complete an epipolar generation. The latest version of Geomatica is 2012, was just released, and promised support of x64 systems, and increased utilization of multiple cores and processors.

Our baseline system completed the data operation in 40 minutes, which while a huge improvement over our previous systems, isn’t surprising considering its 5 years newer.

The following table illustrates some of the findings while changing around the hardware configuration.

Setup Start Time End Time Duration (minutes) % Change from baseline
Baseline 9:48 10:28 40 0.0%
Baseline test # 2 (no change in settings) 1:27 2:09 42 5.0%
SATA Drive (2 year old 160 GB) 2:31 3:26 55 37.5%
Remove 1 processor 3:47 4:26 39 2.5%
Baseline (set affinity to 1 core) 7:23 8:05 42 5.0%
8 GB RAM (2 x 4.0 GB RAM) 8:31 9:13 42 5.0%

From the results, we can see that there is around a 2 minute margin of error, and that:

  • Older SATA drive greatly reduces performance
  • Additional processor provides no benefit
  • Additional cores provide no benefit
  • Additional RAM above 8 GB provides no benefit

 

It was clear after these tests that the value of the purchased desktop was terrible. I figured it was worth doing the same benchmarking on our current standard CAD system:

Optiplex 990

Intel Core i7-2600 – 3.4 GHz
8.0 GB (2x4GB – 1333MHz DDR3)
500 GB SATA 7200rpm
ATI Radeon HD6450 1GB

Total Cost: $1480

The following are the results compared to the baseline 40 minute process:

Setup Start Time End Time Duration (minutes) % Change from baseline
Test on Optiplex 4:45 5:15 30 25.0%
Test on Optiplex (processor priority on high) 7:19 7:48 29 27.5%
Optiplex (Geomatica 10.3) 9:29 10:01 32 20.0%
Optiplex (4GB RAM – 1 stick) 10:47 11:28 41 2.5%
Optiplex (with SSD) 4:33 4:57 24 40.0%

Here’s where it gets interesting. It is clear that our CAD desktop is much faster than the baseline, and this is solely by having a faster CPU clock speed.

I tried using the previous version of the software (10.3) to see if there were any improvements in the 2012 release, however at only 2 minutes slower it is effectively the same.

Dropping to 4 GB of RAM did produce a difference, but this makes sense as the data set is 4.5 GB. We can see from previous tests that having more than 8 GB of RAM made no difference.

Best overall performance was with the faster processor and an SSD.

We did another test called DEM Extraction with the resulting files from the epipolar creation process above. These were 2.5 GB files, which I’m told took multiple days to do a DEM Extraction on in our current systems.

Setup Start Time End Time Duration (minutes) % Change from baseline
Baseline (Precision) 2:09 7:21 (15th) 1032
Optiplex 1:23 11:08 (14th) 585 43.0%

The baseline system did the job in 17 and a bit hours, while our CAD Desktop finished in 9 hours 45 minutes!
Since then, we’ve returned the Precision, and I’ve convinced the department manager to let us bring in a custom desktop with an Intel i7-2600k, which we’ll overclock to 4.x GHz for the best performance.

 

Overall, we’ve seen similar performance needs with our AutoCAD Civil 3D benchmarking where the largest performance gains were seen by having a high clock-speed CPU.

The problem now is finding 8 GB DIMMs that are unregistered, non-ECC to go into a P67 motherboard.

 

 

 

 

 

 

 

Ubuntu on Hyper-V issues with Integration Services

I’ve set up a new VM with Ubuntu 10.10 server, running on our Hyper-V 2008 R2 SP1 cluster.

I followed the excellent walkthrough by Ben Armstrong here to make sure I wasn’t missing anything.

However after the install completed, I immediatly I ran into problems. Following the instructions to enable the Integration Services, after a reboot there was still no network connectivity, and I started seeing strange errors during the boot process mentioning Call Trace errors.

As well, the CPU on the VM was running at 100% constantly.

After further reboots, the VM would lock up entirely and become unresponsive.

I restarted the process, but immediately after first login made a snapshot and started looking.

Running the “top” command, I found the process ksoftirqd taking 100% of the CPU. Looking into that brought me to this forum post:

http://ubuntuforums.org/showthread.php?t=1494797

It sounds very similar to my environment, as I’m using Dell R410 with Broadcom NIC’s.
Following those instructions, I was able to disable the integration components and add a legacy network adapter. Now the VM is running just fine.

Of course, I’m going to have to set up NTP now, and accept a bit of lower performance, but in this instance, thats fine.

If I ever do find out the source of this issue, I’ll edit this post.