Anyone who has worked in Information Technology long enough has encountered users claiming that their files have “vanished”. Much like the elusive Sasquatch on Animal Planet’s Finding Bigfoot (*sarcastic eye roll*) or the superhero who is only invisible until you look at him, claiming that a file just “disappeared” on its own doesn’t usually carry much weight in the real world.
If you’re the jovial sort, you’ve probably gently chuckled to yourself and then assured the user that information doesn’t just disappear and set out to help them find their missing files. If you’re the more cynical type (think Nick Burns: Your Company Computer Guy), you likely came up with a colorful analogy regarding remedial users and computer lessons down at the local zoo.
However, when one of your largest clients, who happens to be extremely advanced in their understanding of the various technologies implemented at their firm, contacts you regarding disappearing files you know that something has definitely gone awry.
Over the past few months, the user-base at one of our client’s firms had been experiencing some extremely odd behavior in their AutoCAD/Civil 3D environment. Initially, this started manifesting itself as DWG files that seemed to disappear at random and inexplicably. Upon further investigation, it was noticed that now and then users in one office could see the files on the network while a user in another office could not.
Eventually, users were able to pin down that this seemingly intermittent behavior was occurring immediately after a save. This was evidenced most notably when a project team member would be working in an external reference (XREF), would make changes that required a save to that file, and when other project members who were referencing the updated file received a notification to reload the reference (XREFNOTIFY), the reload would fail displaying a message that read “*Invalid*”. Often after receiving the message, users would only look for the file later in the day to discover that the file was missing and not fully understand the reasons why this occurred.
In the past, the Application Specialists here at Microsol Resources, including myself, had seen an issue similar in behavior where files would become read-only immediately after a save. While these two issues appeared to have the same underlying cause (a save failing due to high network latency) they manifested themselves with vastly different results. Although the symptoms were not identical, the behavior certainly *looked* like a network timing issue that we have seen in the past.
In order to flesh out the subtle intricacies of what happens when AutoCAD saves a file, I feel it’s important that readers understand the process by which AutoCAD saves a drawing. I can’t find an Autodesk specific source that cites the process below, but it does seem to be the de facto procedure when DWG files are being saved:
- AutoCAD verifies the file lock that it created previously (either at file open or on last save).
- AutoCAD creates a new temporary file, and locks it. The current drawing information is written to the temporary drawing.
- AutoCAD deletes the .BAK file.
- AutoCAD sends a remove lock request for the original .DWG.
- AutoCAD tries to rename the .DWG to a .BAK. (General problem location: Usually when a read only occurs the server has not completed the remove lock request. The rename is then treated as a sharing violation and the rename request is denied by the server.)
- AutoCAD unlocks the temporary drawing.
- AutoCAD renames the temporary drawing to the original drawing name. (The read only problem can occur here as well, as if the file lock is not removed before the rename request is made, there is a sharing violation and the rename request is denied.)
- AutoCAD then re-locks the original drawing name.
Understanding this process, it would seem that something about our client’s environment was interfering with these series of commands causing the DWG files to be deleted, but the question remained “what exactly was causing this phenomenon?”
Fortunately, our client discovered that this behavior was being brought about by an issue with version 2.1 of the SMB protocol. SMB (Server Message Block) protocol, simply put, is a file transfer protocol used on Microsoft networks and is primarily used for file and printer sharing and allows client programs or applications to read, write, create, and update files on a remote server.
Well, as it turns out, our client’s issues began exactly when their network team upgraded the filers on one of their NetApp storage devices to NetApp OnTap OS 8.1.1 in an attempt to mitigate existing performance issues. This upgrade worked so well in alleviating the existing issues on this one server, that all the remaining servers were upgraded as well. What the network engineers didn’t know was that this OS upgrade also included an upgrade to the SMB protocol. It was noted that during the NetApp OnTap OS upgrade process, the network engineers came across a checkbox to “Enable SMB 2” and since a version number was not mentioned, it was assumed that SMB 2.0 would be installed when in fact SMB 2.1 was installed.
The network engineers, unaware that this protocol had been upgraded in their environment, observed the missing files issue began occurring and seemed to be more pronounced when users were in one office and the data they were working on was stored in a different office. It is important to note that the client additionally uses Riverbed accelerators on each end of their WANs (Wide Area Networks). While it seemed that files created in such applications such as Microsoft Word and Excel were more tolerant of the latency issue caused by SMB 2.1 and eventually saved correctly, AutoCAD files were occasionally deleted likely due to the way the files are saved. Subsequently, NetApp has released a more recent upgrade to their NetApp OnTap OS (8.1.2) that rolls back the SMB protocol to version 2.0.
At of the time of this blog entry, our client issues seem to be resolved but I will be pursuing a more robust conversation with the Autodesk team to gather additional information on the problem if any is available and edit this post. It should be noted that there are a couple of existing Autodesk whitepapers regarding performance issues with SMB 2.0 with sheet sets, but neither outline the specific problem I detailed above.