Network up but DNS mysteriously broken

I was recently troubleshooting a computer for a family member, where they reported “I can’t access the Internet” and the resolution was something I’ve never seen before.

This was a laptop with both an Ethernet and Wifi connection. They were both set to DHCP with dynamic DNS, and IPCONFIG displayed the correct information.

I could ping to 8.8.8.8 confirming network connectivity, and an NSLookup found my gateway acting as a DNS server which could properly resolve external names.

However, as soon as any browser attempted to access a DNS name, it failed. Chrome gave a “DNS_Probe_Finished_Nxdomain” error, and IE simply stated “Page could not be found”.

I checked the Hosts file for malicious entries, ensured no proxy was enabled within IE, and verified the routing table was all normal.

I ran ComboFix and GMER to look for rootkits, and started the computer in Safe Mode with Networking but none of these resolved the issue.

Finally I decided to install WireShark and run ProcessMon while the browser connection was made, in an attempt to see where these requests were going.

When trying to run WireShark after the install though, it gave an error about a missing “dnsapi.dll” file. I verified the file was in the proper location (c:\windows\system32), but on a hunch decided to refresh it from SFC with this command:

sfc /scanfile=c:\windows\system32\dnsapi.dll

The output confirmed a corrupted file was replaced, and then I rebooted Windows. Once it came back up, all external browsing worked!

I suspect that some malware had gotten onto this machine and modifed the dnsapi.dll file, but at some point had been partly removed.

This one left me confused for a while, so hopefully this helps anyone else coming across the issue.

 

Nagios incorrect Hostname

I have certain infrastructure monitored by the parent company’s Nagios environment, and as such I’m not well versed in its setup or configuration. However I’ve recently been receiving notifications for hosts where the Host Name does not match it’s actual defined values.

For example, I’ll receive an email stating:

Host: Office #2 (server #1)

where I would normally expect it to display “Office #1”.

This led me down a path of learning a bit about Nagios.

First I browsed through the host monitoring to where the email notifications are displayed. Here I was able to determine the command used to populate the notification body:

/usr/bin/printf "%b" "** Nagios **\n\nAlert type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$ ($HOSTALIAS$)\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" |

The bolded $HOSTNAME$ there is where the incorrect data was coming from. Google tells me this is set in the host definition config file, which in my environment was located on the Nagios server here:

/opt/nagios/etc/hosts-standard.cfg

Finding my server definition in that file showed that it was entered correctly.

I got lucky with more google search terms and came across this link.

It appears that Nagios uses a “retention.dat” file which is effectively caching old values, and this file is referenced during notifications.

This file was found here in my environment:

/var/local/nagios-3.2.3/retention.dat

I’ve asked my Nagios administrator to update this file, and I’ll update this post if it proves to be successful.

 

Radgrid Context Menu on a button

With a Telerik RadGrid, the right-click context menu is not obvious to my users, so I needed to add a button that will perform the same function.

Unfortunately this isn’t a feature built into the RadGrid, and it took me a large amount of trial and error to get working.

Here’s what it looks like:

context

And here’s how to do it:

1. Add the following javascript to the script section of your page:

function showMenu(event, columnName) {
                var RadGrid = $find("<%=RadGrid1.ClientID %>");
                var gridId = RadGrid.get_id();
                var columns = $find(gridId).get_masterTableView().get_columns();
                for (var i = 0, length = columns.length; i < 1; i++) {
                    if (columns[i].get_uniqueName() == columnName) {
                        columns[i].showHeaderMenu(event, 75, 20);
                    }
                }
            }

2. Ensure you have an “OnItemDatabound” event for your RadGrid

3. In your “OnItemDatabound”, place the following code:

if (e.Item is GridHeaderItem)
            {
                ImageButton button = new ImageButton();
                button.ID = "ContextButton";
                button.Style.Add("padding-left", "5px");
                button.AlternateText = "ContextButton";
                button.ImageUrl = "../Images/Icons/contextMenu.png";
                button.OnClientClick = "showMenu(event, \"FullName\"); return false;";
                e.Item.Cells[2].Controls.Add(button);
            }

Where I have the “FullName” listed, replace that with an actual column name in your grid.

Now your button should show up in HeaderCell 2 (which for me was my first column) and operate exactly the same as the Right-Click context menu.

Hyper-V NIC Team Networking issue

I encountered an issue with a Hyper-V virtual machine recently that had me very confused. I still don’t have a great resolution to it but at least a functional workaround.

At a site I have a Server 2012 R2 Hyper-V host (Host), a Server 2012 R2 file server guest (FileServer) and a Server 2012 R2 domain controller (DC).

The Host has two NICs in a single switch-independent address hash team. This Team is set as the source of the vSwitch which has management capabilities enabled. We wanted to use a NIC team to provide network redundancy in case a cable was disconnected as this network is being provided by the site owner rather than my company.

Host and FileServer could reach DC, but nothing else could. DC could reach Host and FileServer, but not even it’s own default gateway.

This immediately sounded like a mis-configured virtual switch on Host, as it appeared DC could only access internal traffic. But I confirmed the vswitch was set to “External” and if this were the case the FileServer would have presumably been affected by this issue as well, but it was not.

I tried disabling VMQ on the Host NICs, as well as Large Send Offload, since both those features have been known to cause problems, but that did not resolve the issue either.

I tried changing the teaming algorithm to Hyper-V Port and Dynamic, but that didn’t resolve the issue either.

Then I decided to put one of the NICs into a Standby state in the team. This caused the accessibility to switch between the VMs; all of a sudden nothing external could reach my file server, but the DC came online to external traffic.

I tried changing which NIC was in standby but that still left me with one VM that had no connectivity.

My assumption at this point is that this issue is being caused by Port Security on the network switches; something that we are aware the site owner is doing. I suppose that the Team presents a single MAC address across multiple ports, which the port security doesn’t like and so it blocks traffic from one side of that team. Because of how traffic is balanced across the team it leaves one of the VMs in an inaccessible state.

Unfortunately we do not have control over this network or the ability to implement LACP, and so I’ve had to remove the NIC teaming and go back to segregated NICs for management and VM access.