Sonicwall Global VPN disconnecting repeatedly

For a while now I’ve had my Sonicwall Global VPN policy on the firewall set as a “route all” connection. This means that all traffic for the VPN client goes through the Sonicwall directly, and blocks access on the client’s end to local devices.

 

Yesterday I came upon a situation where I needed to enable a client access to both the VPN and local devices at the same time. This called for Split Tunnels!

However, I didn’t want to enable split tunnels universally for all my VPN clients. Luckily I found this Sonicwall documentation on setting up a single WanGroupVPN with two different policies based on user group.

The premise is that you set up your Wan GroupVPN as a split tunnel, but then give certain users access only to a specific address object and use a specific NAT Policy (I won’t regurgitate the entire document here).

This was working great, but I shortly found that when testing as the split tunnel user, I would get connected and then disconnected within 10 seconds. Typically the connection would last for 3 successful pings.

After a bit of Googling I found this article which explained it being caused by an incorrect address object within the “VPN Access” tab for the user.

I checked that out, and strangely enough, only the correct item was listed:

I looked at both the “Everyone” and “Trusted Users” group and it looked the same.

 

After a lot of head scratching, I finally discovered that in fact the “Everyone” group did have “All Interface IP” object applied to it, by viewing a logged in user’s status here:

Click for large view

Somehow that was still selected for the “Everyone” group, but it just wasn’t displaying when viewing the “VPN Access” tab. So I clicked “remove all”, and then re-added the appropriate objects, and problem solved!

 

Unicast Flooding on PowerConnect 5548

Yesterday I learned something new; it’s possible for a switch to stop operating as a switch, and start flooding all unicast packets out every interface. This is something I just solved on a Dell PowerConnect 5548 switch.

In retrospect, this happened a few months ago too, but at the time I couldn’t spend any time troubleshooting, and rebooting the switch resolved the problem. This time I wanted to get to the source of the problem.

I first noticed a problem when accessing network resources was a bit slower than normal. I took a quick look at our network weathermap (combination of Cacti and weathermap plugin) and noticed that all ports coming out of our 5548 were pushing ~90 Mbps of traffic, which is definitely not normal

Weathermap plugin for Cacti
Weathermap Output (taken during normal traffic)

 

I logged into our Cacti interface and took a look at the graph for one of the interfaces on that switch:

Based on that graph, I could see that the traffic started Tuesday morning, and was pretty consistent.The interesting thing is that this was happening on ALL the interfaces on the switch, including the Link Aggregated groups.

PowerConnect monitoring graph
PowerConnect monitoring graph

 

At this point I spent quite a bit of time trying to figure out whether it was a reporting problem (since the devices on the other end of those connections weren’t reporting high traffic) or an actual traffic issue. I hadn’t heard of unicast flooding before, so I didn’t immediately start looking there.

I started a Port Mirror from one of the ports that should have almost zero traffic, and Wireshark gave me hundreds of thousands of packets within a few seconds, all for devices not actually on the port I was mirroring. At this point I understood what was happening, but not why.

A quick google led me to the term “Unicast Flood” and some probably causes, but none of them really applied. My network topology is flat, a single VLAN with no STP. CPU utilization was low, and the address table only had 8 entries in it.

Wait, 8 entries? A core switch should have hundreds of entries in it’s address table right? I was experiencing a unicast flood because the switch wasn’t properly storing MAC addresses in it’s table, causing almost all the traffic to be pushed out every interface.

Back to google, and I eventually came across the following in release notes from firmware in October 2011:

Description User Impact Resolution

Devices stop to learn MAC addresses after 49.7 days

After 49.7 days of operation, the device stops re-learning MAC addresses. These MACs which were previously learned will not appear in MAC address table. As a result traffic streams sent to previously learned MAC addresses are treated as unknown-unicast traffic and flooded within the VLAN.

MAC address learning mechanism was fixed so that both learning new addresses and re-learning existing addresses are updating the MAC Address database.

That’s one mighty big bug to be on a core switch. Turns out that I hadn’t updated the switch to the latest firmware when I first received it in February 2012 (nor was it shipped with current firmware) which is a very uncharacteristic thing for me to do.

Today I updated the firmware to the latest, and we’ll see what happens in 49.7 days.

 

Core Switch upgrade

PowerConnect 5548Last night I performed a network upgrade, replacing an old and potentially failing 3Com 3848, with a Dell PowerConnect 5548.

 

The network design that my head office has leaves a lot to be desired, but unfortunately I don’t have the resources to do a complete overhaul. Right now we’ve got a single core switch, which is linked to our Sonicwall NSA 2400 by one switch port.
Every other access switch, and all servers directly connect to this core switch. Almost everything is using 2 port Link Aggregation when it makes it’s connection to the core. There’s zero daisy chaining of switches, and since everything connects directly to the core, I’ve got STP off on the core. Our access switches are mostly PowerConnect 2724’s anyways, so they don’t support STP.

This is a very simple and effective network layout, but it’s far from redundant. If that core switch dies, our entire network is down, including every service we provide to external clients.

My future goal is to replace the 2724’s with 2824’s that do support STP, and then instead of LAG groups to a single switch, I’ll use two uplinks to two separate core switches. The only thing that is currently beyond my knowledge is how to link those two core switches to our Sonicwall redundantly without creating a network loop or routing issues.

 

Replacements like this are one of the high points of my job. It seems simple, basic and low on the skill requirements, but there’s something about meticulously planning and then implementing hardware that is very relaxing, very enjoyable. I don’t know if others share this sentiment, but racking things is definitely a job perk that I wouldn’t want to give up.

Motorola Defy and Exchange ActiveSync

Motorola Defy cell phoneI’m working on a problem right now involving a Motorola Defy phone and Exchange 2003 Activesync. Posting this as a means of gathering my thoughts, and in the hopes that if anyone else if having this problem they find this for discussion.
About 15 of these phones have been brought in for our field staff since they’re rugged and reliable. We’ve broken too many HTC Desires out in the field to continue throwing away money like that.

 

The problem is, using the “Corporate Sync” application on the Defy, we can’t send email. Here’s what I know so far:

 

  • Only using an Exchange connection, not POP3 or IMAP
  • Attempting to send email over cell network fails
  • Attempting to send email over wifi (on the same network as our Exchange server) is successful
  • Attempting to send email over wifi (external network) fails (unconfirmed)
  • The Defy reportedly used to work properly, but haven’t since mid-September
  • This coincides with the network upgrades I made around that time, where I replaced our firewall with a Sonicwall NSA 2400 and replaced our Reverse Proxy (running Apache) at the same time.
  • All Exchange traffic goes through the reverse proxy
  • No other ActiveSync devices are having a problem (Windows Mobile 6.5, iOS, Android)

 

I’ve disabled all the security features on the Sonicwall in the hopes that was causing the problem to no avail. I’ve looked over the Apache logs on the reverse proxy, but don’t see anything unique to the Defy. I’ve compared my reverse proxy config between the new an old servers, with no difference.

 

This is frustrating because it seems to only be a problem with the Motorola Corporate Sync app, but its only a problem when traffic flows through my reverse proxy or firewall. Its such a specific set of circumstances and I can’t figure out what is causing it.

If I do come upon a solution, I’ll definitely post an update.

 

VMware Server Bridged networking not working

I recently had a problem where I restarted a server that is running VMWare Server 2, but when the server came back up, there was no network connectivity for any of the virtual machines.

 

At first I began troubleshooting within the VM itself, thinking it was something to do with Ubuntu on the VM’s I have. This didn’t produce any results, but I did notice the VM could reach the host IP, but not outside the host.

 

Eventually I discovered that the vmnet0 had bound itself automatically to one of the physical adapters on my server, rather than the Team adapter that is set up through our Broadcom NIC’s.

To resolve this, you need to make sure that vmnet0 isn’t auto-bound.

In the Start Menu, under VMWare, open “Manage Virtual Networks”.

Turn off automatic binding:

 

And then manually bind vmnet0 to the proper interface:


After applying, you should see connectivity return immediately.