I like Xanthir's theory, but cannot confirm it. If that is the case, we could poke at it and see what clears the 'from the internet' tag, and/or experiment with what would stop the explosion of the merge document.
To make the bad file good, all I have to do is open it in GVim and save it. Saving it in Notepad does not make it good. I've tried every combination of line endings under the sun.
These are somewhat tangential but... I'm assuming by 'every combination of line endings' you mean "CRLF" vs "CR"? (and not, say, "go to top left of the page", then a set of LF until you reach the right line number...)
Does it have an empty final line? (Some programs behave poorly if you don't have that empty final line in really quirky ways)
If I FTP this file down using either ASCII or Binary mode, it works fine. If I right-click and Save As from my browser (happens in IE and Firefox), it blows up my merge document (Word 2007).
A detail you are missing -- clearly the Save As didn't blow up the merge document. It was the operation afterwards -- when you used (Word 2007) to merge it into a previous version of the same document, I'm guessing?
What happens if you copy the file (to a new name) using windows explorer? A dos command line? Notepad, where you highlight the entire file, copy, then open a new notepad, paste, and save it out under a different name? (I could see gvim saving when you tell it to save, while notepad being all smart and saying "nothing to do here, I won't save it out" -- and I could see gvim deleting the file then recreating it, while notepad might reopen the file for writing without deleting it, or other minor differences.)
...
Here is information on the "from the internet" flag:
http://www.howtogeek.com/70012/what-cau ... remove-it/Sysinternals streams:
http://technet.microsoft.com/en-us/sysi ... s/bb897440(I already had it from downloading some sysinternals bundle of file system utilities)
and see if the operation that cleans the file also removes the "from the internet" stream.
I'm wondering if the "from the internet" flag must cause the problem, or if there is a way to make it take the "from the internet" file and process it correctly by changing its contents.
Assuming you save your sysinternals utilities to c:\bin,
c:\bin\streams -d $filename$ will delete the streams associated with $filename$.
You can also see streams by typing dir /r (without the streams command). I haven't spotted how to delete ADS (alternative data streams) without the streams command (or using the same API is uses)...
...
I wonder what is is that word is reading from the ADS that makes the merge blow up. I'd be tempted to look at the file access involving the file in question, and maybe do a diff on the resulting logs. (procmon from sysinternals can be told to tell me about everything that reads a file called "foobar".) That would generate more information, but "delete the ADS" is probably the real answer to the practical problem.