LessThanDot Site Logo

LessThanDot

A Technical Community for IT Professionals

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Highly Rated Users

Forum
No Posts Rated

Top 50
Given
Received

Links

Wiki
Blog

Forum Statistics

Users
Members:
1879
Members Online:
0
Guests Online:
95

Total Post History
Posts:
81451
Topics:
18716

7-Day Post History
New Posts:
0
New Topics:
0
Active Topics:
0

Our newest member
mwojcik

Other

FAQ
All times are UTC [ DST ]

Is Disk Rename/Move Atomic?

Please wait...

Is Disk Rename/Move Atomic?

Postby Emtucifor on Thu Jul 30, 2009 7:57 pm

I have a question about how Windows handles renames or moves of files. For the impatient, read just the bold parts. :)

Some background: I worked with a system that printed to a paused postscript printer, where the spool directory was on another computer. This postscript output was then used as the input for another program. A problem that occurred sometimes was that if the print job was large, it could take many minutes to create the spool file. During this whole time, the process that was polling for files (in this and many other directories) would be completely stopped, instead of moving on to other work it could do.

I took away from this that I should be aware of the race conditions that can occur when writing files over a network, especially if there is processing time or a long transmission time involved. I reasoned that the send should be to a temp directory, and then when complete the file could be moved to the correct input folder.

Note: one partial answer would be that instead of the receiving process polling for files, it would use windows folder notification to find out when something changed, and wait only for the "file saved" event to look at anything. But I think this would not completely solve the problem. In any case, I have not delved into this kind of notification and I am using ye olde vbscript Scripting.FileSystemObject.GetDirectory("path").Files method, and a wscript.sleep 500 to wait a half second.

So now the current situation: I am setting up a process where I poll for a file in a directory at half-second intervals (mostly because I can, and also because during testing I want snappy responses). I asked the person who is ftping the file to the directory to put it in a temp directory, then move it to the "in" folder. He objected and said he'd rather keep his process simple and write directly to the in folder, and that I should just poll every five minutes.

Now, we have a philosophical difference here. I am of the opinion that it's better to avoid all collisions and errors (if possible) than to bank on "it probably won't happen" because I think that's sloppy and will EVENTUALLY lead to a problem. Even if the problem is nothing more than a false error alert in our enterprise alert system (when perhaps the input file is locked upon trying to read), this seems, well, like I said, sloppy. So other than trying to do a bunch of tests and still not being sure what the answer is, maybe someone knows a little bit about this and can help answer:

Oh, by the way, ftp doesn't have "move" but it has "rename" aka "RNTO" and it supports renaming to a different directory, so it's effectively move.

So, will an ftp RNTO operation ever allow a race condition like so:

1. rnto begins and a directory entry for the file becomes available to the OS
2. a polling process finds the file, tries to read it, and is either denied (assuming the file is locked) or reads a somehow corrupted or incomplete file.
3. rnto completes and the file is now readable correctly.

I am skeptical that a move or rename would ever present a locked, corrupted, or incomplete file to a reading process, because to my understanding, only the directory entry is rewritten in a move or a rename. No data is moved in the allocated data blocks of the file on the disk. Right? Perhaps locking could occur if the directory entry takes time to write?

And given what I know about old DOS interrupt vectors and about making sure vector addresses are updated in a single clock operation so that another interrupt can't read a bad address between write operations (after only some of the bytes of the address have been updated), and that move has worked this way since DOS days, I am close to believing that a step #2 above is impossible, though I realize interrupt vectors are a bit smaller than directory entries. It's just that if a process can read a directory entry, then hasn't the directory entry already been rewritten and the file has already been moved? And if no directory entry can be read yet, then there is no problem.

Your input desired and welcome!
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: Is Disk Rename/Move Atomic?

Postby damber on Thu Jul 30, 2009 11:25 pm

Edit: Dear El Beardy One... ;-)

FTP is horrible. But still...

Option 1: Register/Listen for Windows File Event triggers
I couldn't find a very good example, but something like this: http://www.dotnetspider.com/resources/2 ... tcher.aspx you can watch for file events (e.g. renamed, created, etc - which fire only upon completion) - this would be the easiest and least resource hungry way to do this - awaiting an OS callback for a registered listener will make this event driven, rather than polled. Just noticed you're using VBScript (sorry for you), in which case you might want to see what WMI can do for you: https://blogs.msdn.com/wmi/archive/2009 ... reate.aspx though to be honest - you should probably just succumb to writing this in c# / java or something.

Option 2: Use FTP Server to trigger code execution or poll transaction log
Depending on the FTP Server, you could set it to execute your program on completion of the transfer - most decent FTP Servers offer this feature, though if you're stuck with a crappy one (e.g. the MS standard junk) then you can always poll the transaction / event log for a completed transfer - this should show the start and end of the transfer. Polling and parsing the event log is a bit icky though :-( - I've often done this on Unix to pipe the stdout to the relevant program, which saves polling and drip feeds based on events, though doubt you can set this up very easily on std windows FTP.

Option 3: Use Renames
Polling a directory for a name pattern and having the sender rename after successful upload is one of the most basic workarounds - it is (as far as I know on Windows) safe to assume the rename is 'faster' (read: atomic in this case) than your program if it is a 'rename' (move) on the same drive/partition (e.g. uses the same master file table - $MFT) etc, therefore the filesystem should simply change the reference to the file and not need to move the actual bits on the disk - which should be atomic from your perspective, as (like you state) it will update the $MFT meta file before presenting it as available through the filesystem api, which should also read from this table/metafile.. - though it is windows, so who knows.... ;-)

Of course, you could just join the 21st century and try a computer-to-computer transport protocol instead of a human-to-human one ;-) Wait until flakey packet loss and hung Data Socket connections bite you in the a$$, or you talk to a non-MS platform without thinking about the correct conversions :-)

Hope that helps,
a smile is worth a thousand kind words, so smile, it's easy! :-)


CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless
User avatar
damber
LTD Admin
LTD Admin
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
 
Posts: 3138
Joined: Tue Oct 09, 2007 1:48 pm
Location: North Wales, UK

Re: Is Disk Rename/Move Atomic?

Postby Emtucifor on Thu Jul 30, 2009 11:50 pm

I can see I need to change my avatar if I'm going to get ribbed about it. Haha.

I will look into the file event triggers, as you mentioned (and I mentioned). I can use them from vbscript by registering a COM+ object, or I'll just bite the bullet and write a c# app or something. But, of value in this organization not full of skilled programmers is easily-accessible script that can be tweaked by someone with a little less skill. I do somewhat detest polling and wish it could be COM or at least event notification or something like that.

The sending system actually happens to be UNIX, thus the FTP method instead of UNC file shares. But these are text files and using ASCII transfer converts the LFs (or CRs, whatever they are) into the CRLFs that I expect.

Now, if you know of a way for perl on UNIX to easily query a sql server database, that would be great, because then I'll tell my colleague and harass him for his lack of skillz that I, the non-perl-knowing person, had to edumucate him on how to do that.

His interface data doesn't have employee ID, so he's sending me data with SSN in it and I'm parsing it and doing a db lookup to switch it to employeeID, then he's taking the file back. The whole clunky ftp->vbscript->onward thing is because he thinks it's too complicated for him to do the db lookup and I told him I could have it done quickly (which I did).

P.S. Suggest some computer-to-computer transmission methods for me?
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: Is Disk Rename/Move Atomic?

Postby damber on Fri Jul 31, 2009 7:11 pm

Emtucifor wrote:I will look into the file event triggers, as you mentioned (and I mentioned). I can use them from vbscript by registering a COM+ object, or I'll just bite the bullet and write a c# app or something. But, of value in this organization not full of skilled programmers is easily-accessible script that can be tweaked by someone with a little less skill. I do somewhat detest polling and wish it could be COM or at least event notification or something like that.


Will the WMI stuff I linked above not help you with that? The scripting is a little more esoteric, but it's still scripting, and should be pretty obvious what it's doing as well as using standard apis. It would be the better way to go imho.

Emtucifor wrote:Now, if you know of a way for perl on UNIX to easily query a sql server database, that would be great, because then I'll tell my colleague and harass him for his lack of skillz that I, the non-perl-knowing person, had to edumucate him on how to do that.


Perl makes my eyes bleed, not to mention my heart :-( I tend to wear special protective gloves when I have to touch something that has mutated in perl ('cos you just never know!). But
You can use perl to talk to SQL server from a *nix machine, though I think you'll need to buy a driver/component that supports unixODBC so perl can talk to it through DBI or DBD::ODBC or something like that. I think you can get a reasonable driver for free but probably not as easy as the commercial version.. so a bit more perl knowledge etc. example with commercial driver: http://www.easysoft.com/developer/langu ... orial.html - mmm.. this is about the best I can find in a short search: http://answers.google.com/answers/threa ... 32277.html But I'm not a perl expert, so I suspect if you know what you're doing, this should be pretty easy.

Emtucifor wrote:His interface data doesn't have employee ID, so he's sending me data with SSN in it and I'm parsing it and doing a db lookup to switch it to employeeID, then he's taking the file back. The whole clunky ftp->vbscript->onward thing is because he thinks it's too complicated for him to do the db lookup and I told him I could have it done quickly (which I did).


Would it not be better to replicate the xref data to the Unix Server to cache for the translation of id's near to the source of the data reference? You could push out either by batch if you don't mind the chance of the odd delay in id updates, or based on a change event - e.g, new ssn<->EmpID reference created/updated/deleted etc This would mean that the work for fixing their incorrect codes is pushed back to the source of the problem. It would save all the back and forth nonsense, and they would be able to query that data to their hearts content without further impacting your application or the network. This supports the co-existence and registry MDM patterns, so will be useful for anyone else who would wish to subscribe to updates for their application to use the canonical id (assuming yours is).

Emtucifor wrote:P.S. Suggest some computer-to-computer transmission methods for me?


haha - sorry, a bit of a common gripe for integration people.. FTP was really designed for person to person communication, but was soon abused for systems to move files between each other. Of course you can use it for computer-to-computer transport, but it's clunky, slow, unreliable and insecure - not to mention inconsistently implemented. Hence why there were lots of Managed File Transfer offerings available in the past (still are actually).

For some scenarios you are better off looking at a message queuing technology - such as WebsphereMQ (very popular/common in the enterprise), though there are many others e.g. SonicMQ ActiveMQ, RabbitMQ, ZeroMQ, or of course a JMS compliant messaging system.. For something more open standards based you could look look at the quite new AMQP that Redhat are pushing forward at the moment as an open standard, heck, even microsoft had one: MSMQ though I think that's kind of withered away now WCF is their baby.

Though you can also look at other robust file transfer protocols, like OFTP (don't even bother comparing it with FTP, it's significantly better), or others more commonly used for b2b, such as AS1, AS2 (most common), AS3 (which are actually an additional layer added to the wire protocols of SMTP, HTTP, FTP respectively), RNIF, or the myriad WS-* standards. Most of these will improve performance over FTP by many times (20+ times for some use cases), offer reliability, security, and fifo sequencing, as well as load balancing, allow persistent or transient messaging and so on. TIBCO has a very fast messaging system that works well for small message sizes, it's proprietary of course, and uses custom UDP packets for efficient multiple subscriber publication (e.g. multicasting). There are lots of ways, some old, some new, and it's not a one size fits all - bulk data transport should try to avoid the standard messaging tools (e.g. WMQ has a 50mb message size limit (or something like that, I forget), so you need to start chunking your data - some will do that for you, others wont, but either way, it is inefficient). Hence, it really depends on your landscape topology (geographical issues etc), message patterns/styles (e.g. sync vs async vs callback, etc), sizes, volume, features needed and so on. As FTP is 'easy' (as in, it is a freely installable/builtin server for many systems, and is usable by many different clients, including human via cli), it often defaults to being the choice for many solutions - great until it tries to scale, or provide reliability etc..

Emtucifor wrote:I can see I need to change my avatar if I'm going to get ribbed about it. Haha.


Aww... there's no shame in being a beardy weirdo.. ;-) We will love you just the same (well kaht and Alex will love you in a special way, but that's your look out... ;-) )
a smile is worth a thousand kind words, so smile, it's easy! :-)


CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless
User avatar
damber
LTD Admin
LTD Admin
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
 
Posts: 3138
Joined: Tue Oct 09, 2007 1:48 pm
Location: North Wales, UK
Unrated

Re: Is Disk Rename/Move Atomic?

Postby Emtucifor on Fri Jul 31, 2009 7:32 pm

Is ftp so unreliable? So, what then... upload the file, then download it, and make sure the bytes match? What if the corrupted upload is corrupted while it is downloaded so that it returns to the original file?

The reason I'm doing the SSN->EmployeeID conversion is that my coworker didn't want to do the conversion, just to use SSN. I hated that. I don't want to be sending SSN needlessly. But events conspired to make me happy and force the conversion required. But my coworker is perhaps a bit... allergic to extra work, shall we say, so he didn't volunteer to let me give him an updated file. I agree with you that giving him the key file daily to the UNIX machine would be best. But we live with what we live with. If I knew Perl I'd just fix it all for him.

Actually, I wish I had been involved in this project from the start because then we wouldn't be hijacking the data out of the interface stream (which is why it's being done in perl in the first place) but would simply be querying from the destination database. Then this whole thing about him not wanting to look up EmployeeID from SSN wouldn't even be an issue because I'd just do more database lookups as needed. Sigh.
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: Is Disk Rename/Move Atomic?

Postby damber on Fri Jul 31, 2009 9:25 pm

Emtucifor wrote:Is ftp so unreliable?


Yes.

For general usage you can get away with FTP, especially for low volumes - often you wont even notice a blip and can handle the odd problem easily; but you can get hung connections all over the place, dropped data connections with no retries, unresponsive command connections, no transaction control, obviously no XA/two phase commit, unhandled packet loss, no repudiation auditability, and all sorts of issues depending on the actual FTP server you're talking with - and quite commonly timeouts across unreliable networks, such as across wans to limited infrastructure countries, or over satellite links, etc.

If you want to try it, run a few thousand concurrent FTP connections to different servers and process a few million transactions per day (even at that relatively low volume you'll see the problems clearly) - you'll spend most of your support hours trying to figure out which one failed and why unless you build things around it to manage it's inadequacies - this is why there are such things as reliable messaging protocols, FTP is a long term thorn in the side of people wanting to make systems communicate. Often the problem is 'other people's FTP servers' - MS being particularly flakey, but not the worst.

Also, to open, transmit, close an FTP connection/transfer for a 2kb message takes about 20x longer than WebsphereMQ on average at reasonable volumes, and consumes a lot of memory and processor resources in comparison.

Emtucifor wrote:So, what then... upload the file, then download it, and make sure the bytes match? What if the corrupted upload is corrupted while it is downloaded so that it returns to the original file?.


Message Digests and Checksums, sequence numbers, and so on help a lot when used as part of the message enveloping, though each participant needs to be aware of that and handle/implement features to deal with it; however, there are several techniques built into some protocols to solve reliability and performance issues, including compression, encryption, non-repudiation, reliable messaging, two phase commits, etc.

...Anyway, I would say that it sounds like the FTP thing is the least of your problems - does anyone actually design the solutions at your company, or is it the 'best efforts' of programmers in different areas - hoping that somehow they'll come to a consistent design? I don't mean that to sound facetious, just that from the sounds of it, they could do with some 'solution design' rather than just 'programming'. Why don't you push for it yourself? I'm sure you would do a good job and it would certainly improve things for your company to have an overall design perspective on your solutions, instead of programmer warfare :-)
a smile is worth a thousand kind words, so smile, it's easy! :-)


CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless
User avatar
damber
LTD Admin
LTD Admin
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663LTD Silver - Rating: 663
 
Posts: 3138
Joined: Tue Oct 09, 2007 1:48 pm
Location: North Wales, UK
Unrated

Re: Is Disk Rename/Move Atomic?

Postby Emtucifor on Fri Jul 31, 2009 11:43 pm

We officially "don't do development."

That means that managers don't understand it, there's no support for it as a holistic whole, and each person is given to do only what he claims he can do for himself. There is no understanding of the length of time quality development takes, or the difficulties in one person being an entire team (spec writer, UI designer, needs assessment, subject matter expert/customer interface, project manager, developer, tester). I am struggling because no one helps me plan, test, manage, design or anything. And I don't have experience with these, not even working with people who performed these competently so I could borrow a few tricks. I am making it all up as I go along.

So yeah, I have no power, no backing if I think something should be done a certain way, there is no person who has enough breadth to understand the design of entire interrelated systems. It's all down to what I think I can accomplish, who I can convince, and what I can do under the radar without getting in trouble... and then a bunch of stress when I'm wrong about any of those and am committed to something I am having trouble delivering.

I feel like sometimes I perform miracles getting done what I do to the level of quality that I do, but no one recognizes they are miracles. You know, to the boss, your application is more "how it looks" and "whether it breaks" than what it actually does. I could build the Golden Gate Bridge and I'd be judged more on whether it looks pretty and people always get to work on time over it (which has little to do with the quality of the bridge itself), rather than how likely it is fall down or how innovative or amazing its construction is, or the technical design and materials hurdles that had to be crossed in order to create it.

Long story short: programmer wars is the rule of the day here.
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated