Delayed write error

Post your bug reports here
rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Delayed write error

Post by rh99 »

Using Synchronize It! Version 3.4.1639 on WinXPsp2 I've gotten delayed write errors & system freezes when syncing (duplicate setting) 40 to 120 gig directory structures from one drive to another. This is for backing up main digital photo database to a couple external drives.

wndsync seemed to run well, but the job would eventually fail. A Win delayed write error would appear, often with a message about possible corruption in $mst, the ntfs master file table, yuck! Typically a couple files just before the crash would be partially written or written/created with the same date & filesize, but would be partly or all full of zero bytes! Fortunately I discovered this.

Delayed write seemed like such an odd error as I don't know of anything else which could have been touching the relevant files, and I wasn't yanking the drive cable out or anything! Hooking the external drive up via usb, firewire & esata all had same result. I spent weeks testing thinking it was defective drive, or a windows system problem, or anything but wndsync. Ran spinrite on the drive for days, etc. I am not aware of any user changes one can make to wndsync's buffers--don't know if that would be relevant or not. I considered trying to change some of the internal win buffer settings (a sometimes mentioned solution to delayed write problems), but I didn't.

Finally, I tried a run with Total Commander's dir sync and that ran with no problems. So wndsync (or something in my pc wierdly interacting with it) is the guilty party according to this diagnosis. I had switched to wndsync cause it was much faster for me, but those delayed write problems are a deal breaker.

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

Thank you for your report! You have mentioned that Synchronize It! is faster - it this faster copying or folders listing? As far as I have understood, you are using external drive as target? What kind of external drive is that?

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Delayed write error

Post by rh99 »

Faster is my perception, not the result of timing tests. Synchronize It! seemed faster generating the preview, and that is in comparison to TC's sync preview--however I have TC's copy/delete buffer size set to "big file mode" and a small buffer of only 64k. That seems very small to me, but I haven't yet tested how it performs with larger buffer. I'm happy to have a sync which seems to work safely. Neither wndsync nor TC sync appear to use much of my system memory while in operation. Also, everything now seems slower to me as originally I was using wndsync in compare file date&size only (thinking that was safe enough), but now I'm running everything in test content mode for safety.

The external drive is a HITACHI Ultrastar 750GB 7200 RPM SATA 3.0Gb/s Hard Drive. Installed in AMS Venus DS3 3.5" USB2.0 (type B) + SATA External Enclosure. Ultrastar is the higher, server-quality of Hitachi drives (This is the only Ultrastar drive I have, I have several of the Deskstars, Hitachi's workstation version). I have 4 or 5 of the Venus enclosures & find them the most reliable I can get.

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

OK, I will research this this. Is error reported within wndsync, or just by Windows? Does wndsync show any errors itself during file copying?

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

Do you have compatibility mode in TC for that drive?

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Post by rh99 »

Compatibility mode is not enabled in TC.
I never had an error come from wndsync (that's why it never occurred to me that it might be the problem). The error always comes from windows. One example I grabbed: Image
Other times the error from windows included the mention of possible damage to $mst, but I wasn't able to grab that--the system was frozen.

I've uploaded some files illustrating exactly where one of the sync jobs failed with the delayed write error. Don't know if it might be useful or not. The sync was being run on a 43gig directory structure so I've just put up files where things crashed; maybe there's a clue there. See the directories at:
http://www.aptmeans.com/gallery2/sync-check/
Source has files from the source directory, Dest has files from the destination directory. (I noticed that ftping the directories changed the file dates, so I also uploaded a zip file of the two directories as sourcedest.zip.)

All these files on the dest side had the identical dates & filesizes to their version in the source side (I somehow munged up one filedate when I discovered the file differences or was saving the example). The last file placed in Dest was Img_8516.cr2, apparently where things crashed. The earliest file I included is Img_8511.cr2--this one is identical on both sides. The next couple of files: 8512, 8513 are different from source to dest, they seem to be identical up to 2/3 of the file, then the dest is just filled with zero bytes. Finally, 8516 on the dest side is simply completely filled with zero bytes!

Perhaps you can tell something from these fragments. I don't know if wndsync or windows was responsible for allocating the "seemingly correct" files on the destination side. Is there some way for wndsync to operate in a failsafe mode where destination files are never initially allocated with exact file name, size & date--so if things fail it is obvious without doing a content check of each file? (e.g. have a special funny character begin the dest file name till we're sure it's actually copied?)

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

OK, I may try to rewrite copy procedure and use it as compatibility mode. meanwhile can you please try this version: http://www.grigsoft.com/wndsyncbu.zip?
In Options>Settings you will find new "Use compatibility mode" option. Set check, and specify your drive letter. Now wndsync will make a short delays during large files copying. Will it help? If it would not help, you can also try to increase delay length by using this syntax in compatible drive list in options: "D+100". Number after + is a delay in ms. Try default first, then change it to 50 or 100 - value of 100 will slow down copying process significantly, but the question is will this help at all or not.
Thank you in advance!

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Post by rh99 »

I guess you set compatibility mode for just the destination drive. It adds a delay somewhere in the file write process?

Total Commander's description of this is "The compatibility mode is useful for special drives, which cause problems with the default or big file mode, e.g. USB memory sticks."

Don't think I ever needed it before, at least I have never, ever used compatibility mode in TC.

I'll give the 1644 version a try per your suggestion. May take a day or so as I'd like to set something up that adequately tests things, but won't mess up any important files.

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

tests with v 1644 successful so far!

Post by rh99 »

I ran tests with the 1644 version you provided. A couple smaller syncs which copied 3-5 gigs worked successfully, with "compatibility mode" checked and unchecked. Most of the filecopys just zipped along quickly, but some of them seemed to stall several seconds partway through, with the drive making a poketa, poketa, poketa sound and showing this progress stalled: Image
But this was only for several seconds and things proceeded without error.

I was a bit skeptical that things were working okay, so I went back and did a little larger test with version 1639 which I had been using before. Ugh. Image This failure made my backup drive mostly unrecognizable to the operating system! and I had to reboot to get things straightened out.

I then tested the files which had been copied with 1639 before it failed and again many of them in the destination directory were size & date identical, but content different from the source versions.


So then I went back and did a BIG 40 to 50 gig sync job with the new version1644 with "compatibility mode" checked. Everything seemed to work correctly. Still holding my breath, but maybe something in 1644 has indeed fixed the problem.

I'm wondering what exactly is different in 1644? Is 1644 stable/recommended to use and is it going to now be the next released version, or do you still have some tweaks you plan to do?

I'm quite happy the problem may be fixed!

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

Well, from what I have learned about this error, I have suggested it is caused by too fast copying, so target controller can not handle situation correctly. Just as I described, this version inserts a small delay after every MB copied, letting controller do his job. That's all the difference. Still, there are more issues to test - maybe it would be enough to insert pause just between files. But I'm going to do additional tests myself.
Anyway, you can use this version, just be sure to let me know if you will have a problem again.

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Post by rh99 »

Yes, the few internet resources about delayed write errors (even Microsoft's comments) do not offer much help; only limited understanding about very specific problems.

I would like to think that the operating system would handle any needed delay. At least where there's no hardware problem or user yanking a usb stick out. But apparently it doesn't.

Could it be that OS hasn't advanced with hardware? Possibly the win write routines are much the same as those in Win NT (or earlier?). In NT days, hard disks were dozens of MBs and a big write job was hundreds of kbs. Now I've a 750Gb drive and am writing 50Gb at a go. I wonder if write routines in win server are different. Also, I wonder if some bios tweaking would help (e.g. I use LBA, but not 32bit transfers).

Let us know the results or your testing.

& Thanks again.

LanaiLizard
Posts: 16
Joined: Sat Dec 13, 2008 12:44 pm

Lost network connection

Post by LanaiLizard »

I think I have seen the same root issue but with a different symptom.

While synchronizing a folder with very large files (1GB-2GB) I would suddenly lose routeability of the network connection that was running Synchronize It. The copy would halt and any remaining files would be skipped with an appropriate error message about them not being copied.

This was over a wireless network connection to a wired Windows Home Server box. The wireless connection appeared to still be fine though, complete with IP address and gateway information but essentially no traffic would pass anymore. I could not even ping the first hop. I'm still not sure exactly what happens to the network connection because it "looks" ok on the surface.

If I click "Disconnect" and then "Connect" on my wireless connection to re-establish the network connection I will be back with a working network connection again.

I tried your V3.4.0.1644 beta/patch noted in this thread with the default compatibility delay for the destination drive and that seemed to fix the problem. It failed every time with both V3.1 and V3.4.0.1640 but works fine with V3.4.0.1644 in compatibility mode!

Oddly, there may be another problem though because the original copy halted in the middle of a file copy so the destination file would be incomplete of course. But if I turned around and did a Synchronize It! Preview using compare by "Content" method (instead of "date + size") it claimed the two files were equal although they were not. I verified they were not equal with another program (DOS's FC command).

This is very troubling. I have use Synchronize It for a long time and found it to be a truly indispensible tool and this is the first serious issue I have encountered.

Hopefully this information will be helpful for the next update.

Thanks,
Jesse

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Delayed write error

Post by rh99 »

I agree, LanaiLizard, it is troubling to have one's backup process fail in two of the worst possible ways: a) some copied files might have the proper filesize & date, yet have incorrect content, and b) the error hangs one's system. Hanging the system-or your network connections-would seem to be a symptom of some process writing outside an application's legal address space I guess? I would surely like to have a backup fail more gracefully, and especially fail "safe " by NOT leaving copied files which 'look correct' but which aren't. Nonetheless, I have continued to have no problems with the updated sync it, version 1644, so that's what I continue to use. Nothing else is as convenient for my needs. I would prefer it if I could fully trust doing syncs just checking filesize/date because checking content is so much slower.

From what I can gather-and I'm just guessing-the problem "seems" to be at a primitive level within windows' write routines and not something the application programmers like grigsoft can easily control. I infer that from the way the TotalCommander folks seem to have addressed the problem with sort of a hack by offering choice of a slow write process and allowing the user to further add delays to slow things down even more if necessary. To do more, maybe app programmers would have to substitute a better/different error handler at a very primitive level--sounds difficult and like something the OS write routines should be responsible for.

LanaiLizard
Posts: 16
Joined: Sat Dec 13, 2008 12:44 pm

Post by LanaiLizard »

Bob99, I am really glad to hear you have not had any problems with V1644. That gives me a little more confidence but I still don't like using a special patch release for such a critical application. It doesn't seem the patch release was signed or something as I get the Windows dialog on startup that say "The publisher could not be verified. Are you sure you want to run this software?" message.

I'm not sure whether you understood that I had a third major failure in that when I used the Compare by "Content" to check to see if my copies were correct or which ones it had failed to copy correctly it reported that everything had copied just fine when it hadn't.

I understand your comments that the root issue may actually be withing the low level Windows copy routines which makes it a real challenge for any application programmer to work around. If so, then is my data at risk whenever I move large files around using Windows in general? Yikes!

This is sounding similiar to the massive Microsoft failure in the first release of Windows Home Server that could randomly scrambled your data whenever files were written to the server. It was rare event apparently but if the timing of things was just right then you lost data. No copy routine can afford to me less than 100% reliable. A speedy 99.99% is no good. I'll take a slower 100% copy any day.

Jesse

rh99
Posts: 10
Joined: Thu Nov 06, 2003 12:31 am
Location: Virginia, USA

Post by rh99 »

I can't say what might be behind your third problem -- with Compare by "Content." The types of problems I had due to delayed writes screwed my system up such that I could not trust much about things without rebooting. Anything which damages the $Mft, master file table for a volume, is serious & leads me to do a system restart asap! I haven't had any such low level pc errors like that in decades.

It sounds like you did get an error msg, so if you run 1644 & get no error msg, your copies are probably safe. My strategy is to just be more skeptical and check up after the backup process for a while. A pain, but what else can one do. I check using compare routines in total commander, or compare it! or whatever. Probably I should build a verify step into my use of sync it. I guess that's why most of my backup apps like Nero copy to dvd, or Genie Backup Manager typically want to verify a backup each time they create one.

Post Reply