I had been receiving this DFSR error in the event logs for some time, and couldn’t find any real resolution on it. The exact text of the error is:
Error: 1726 (The remote procedure call failed.)
Connection ID: 3880BBEC-6FC1-45B9-8750-196A7C32C9D8
Replication Group ID: B8242CE2-F5EB-47DA-BA5B-1DD2F7EE3AB9
This would cause a break in replication which wasn’t desirable during production hours. The strange thing was, it occurred every 5 minutes like clockwork, for all our servers separated by VPN.
I eventually discovered it was a problem with our Sonicwall devices providing the VPN connection. There was a 5 minute timeout value for TCP connections, which was being enforced on the DFSR connections for some reason.
While not an ideal solution, we have worked around this error by setting the value to a sufficiently high number.
UPDATE Sept 2011: I realized that the majority of this post was describing the problem and not the solution, so I’ve updated with clear instructions on what I’ve done to resolve this.
To start I only created these rules on my hub firewall at our head office. Doing them on each branch office wasn’t necessary.
I created address objects for each of my DFS servers, and placed them into two groups – one for local (from the firewall’s perspective) and one for servers across a VPN link.
Then using the firewall rules matrix, I create two rules, one in each of the indicated sections:
On the properties for each rule, on the Advanced tab, increase the TCP connection timeout to some large value:
This was necessary for my Sonicwall Pro 4060 running SonicOS Enhanced 126.96.36.199-51e. In a couple of days we are replacing this with an NSA 2400 on SonicOS 5.8.x, so I’ll disable these rules to see if the issue still occurs on new hardware.