DFSR Crash and replacement search

A few weeks ago my DFSR database crashed, and crashed hard. I won’t go into the details of my troubleshooting steps, mostly because I didn’t make good notes while it was on-going and I was very sleep deprived. Suffice to say I spent many hours trying to resolve it, and wasn’t successful.

It was a Tuesday night that I disabled the DFS folder targets on my branch office servers, forcing all of my remote users to access our namespace over the VPN. I was hesitant to do it since our fastest link is 5 Mbps but it was the only way to ensure data integrity. Following that we needed to manually sync the data from our spoke servers to the hub since there had been 3 days of non-replication.

While that was going on, I began looking for a solution to our problem. Our problem is that we need to have those within our branch offices working on the same files as the head office. These are files used across a variety of applications including AutoCAD and ArcGIS, so our users are expecting fast access to data that can be quite large.

This is something that is difficult to find information on; not many people are talking about how they handle branch office file collaboration especially in a larger company.

In my case I tested PeerSync for a few days to see if it could replace DFSR however there were a few problems we encountered with our environment which make it unsuitable. In the end I re-implemented DFSR across our 2.5TB of data, and just waited for initial replication. This took another week.

Since then DFSR has been running smoothly, however I’m still looking for a replacement that will be scalable for my company as we grow in offices and data size.

 

Right now, I’m considering two options:

  • Remote Desktop / VDI
  • WAN Acceleration

Both would be a fairly substantial capital investment initially, but with the growth my company has seen it is inevitable.  It has been a long time since my last post because I’ve worked so much overtime lately with this issue in the midst of other projects occurring, and I just haven’t had the mental capacity to sit down and write.

In the next week or two I’ll be noting my thoughts on the two options above.

 

 

 

 

2 thoughts to “DFSR Crash and replacement search”

  1. profiler server v9.0
    9.0.5.0

    We had a similar situation and rebuilt our DFS replication from scratch for just 2 servers connected by a 400Mbps pipe. Deleted all replication groups, waited for replication, on both servers we stopped DFSR service, removed all DFSRPrivate folders and the DFSR folder in the System Information folders.

    Started DFSR services and added 1 folder for replication that is 7gb. It’s been almost a week and went from a backlog of 25726 down to 23345. At this rate, it’ll finish initial replication in about a year.

    We upgraded 2008R2 to SP1 and the latest DFSR KB article as well as tuned via the registry as reccomended prior to this process. None of the top 10 common issues seem to be occuring, except #10, not letting users modify files as I can’t expect people not to do their work.

    I have replication still going on different drive that works 100% and is much larger. I did a robocopy copy after initial replication was implemented which is a no-no, but made initial replication finish super fast and there hasn’t been any issues. I wanted to follow the by the book approach here, as the robocopy trick didn’t work in other attempts. I’m going to review my robocopy process and version as from using a powershell hash compare script I’m seeing 65% or more of my files don’t have the same hash and would need to be copied over 100% new. That might be my culprit.

    Any thoughts?

    http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

  2. Hi Jacob, it definitely sounds like something is still messed up with your replication group. Have you checked the debug logs located at c:\windows\debug to see if there’s any specific error’s it’s noting?

    Assuming that this 7GB replication group is on a volume with no other replication groups, and since you’re only looking at 7GB over a 400Mb pipe, I would do the following:
    – Delete your replication group
    – From an elevated command prompt, delete the corrupted DFSR database:
    Run to Set Permissions –> icacls “c:\system volume information” /Grant Administrator:F
    Run to Delete Database –> rd “c:\system volume information\dfrs” /s /q
    * with c:\ being the volume your replication group is sitting on
    – Pre-seed your data again to the second host with the Robocopy command described here: http://blogs.technet.com/b/askds/archive/2010/09/07/replacing-dfsr-member-hardware-or-os-part-2-pre-seeding.aspx
    – Re-add the replication group for both folder targets
    – Check using Diagnostic reports or WMIC that the replication group has entered initial sync:
    Wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo get replicationgroupname,replicatedfoldername,state
    A state of 2 refers to initial replication and a state of 4 refers to replicating state

    Hope that helps, I’d be interested to know if you get it working.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.