Reading a file as it is written

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Reading a file as it is written

Postby EvanED » Sun Jun 09, 2013 8:52 am UTC

Suppose Program A, which I cannot modify, is writing out a file. I want to write program B, which reads the file as it is being written and does stuff. (In particular, I would like it to be able to operate on stuff early in the file before the file is done.) However, Program B is able to just read the data in a dumb way; if it weren't for the "as it's being written" part of the requirements, it'd just be a bunch of read() calls. I also do not need to detect the end of the file by reading off of the end and getting EOF: the file itself contains a "end" frame.

What I have right now is something like the following (recreated from memory, so may not be exact):

Code: Select all

def size(file):
    return os.fstat(file.fileno()).st_size

def protected_read(file, len):
    while size(file) < file.tell() + len:
        time.sleep(0.01)
    return file.read(len)

and then I just use protected_read(f, l) in places where I'd otherwise have f.read(l).

Does anyone have any suggestions for a better way to do this? I don't request a fixed language (e.g. it could be something other than Python), but cross-platform is preferred and Windows is essential.

User avatar
Xenomortis
Not actually a special flower.
Posts: 1447
Joined: Thu Oct 11, 2012 8:47 am UTC

Re: Reading a file as it is written

Postby Xenomortis » Sun Jun 09, 2013 10:03 am UTC

I'm pretty sure the Win32 API has tools for pausing threads in another process, at least one that you spawned.
That should give a bit more granularity.

But what's the problem with what you already have?
Image

User avatar
PM 2Ring
Posts: 3713
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Sydney, Australia

Re: Reading a file as it is written

Postby PM 2Ring » Sun Jun 09, 2013 11:42 am UTC

Xenomortis wrote:But what's the problem with what you already have?

Checking for filesystem changes in a loop is a bit gross, and wastes CPU, even with a sleep() call. It'd be much better to install an event listener that notifies you when the watched file is changed. Unfortunately, such tasks tend to be OS-specific, and so cross-platform solutions tend to look ugly under the hood.

I haven't needed to do such things in recent years, so I can't make any recommendations, but I just found this via a quick Google:
https://pypi.python.org/pypi/watchdog/0.5.4
Python API and shell utilities to monitor file system events.
[...]
Supported Platforms

Linux 2.6 (inotify)
Mac OS X (FSEvents, kqueue)
FreeBSD/BSD (kqueue)
Windows (ReadDirectoryChangesW with I/O completion ports; ReadDirectoryChangesW worker threads)
OS-independent (polling the disk for directory snapshots and comparing them periodically; slow and not recommended)

User avatar
phlip
Restorer of Worlds
Posts: 7572
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Reading a file as it is written

Postby phlip » Sun Jun 09, 2013 3:00 pm UTC

I know you say Program A can't be modified, but can you change what filename it's writing to? Can you get it to write to a named pipe?

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
TheChewanater
Posts: 1279
Joined: Sat Aug 08, 2009 5:24 am UTC
Location: lol why am I still wearing a Santa suit?

Re: Reading a file as it is written

Postby TheChewanater » Sun Jun 09, 2013 7:28 pm UTC

You could inject your own fwrite() that prints a line of text for each call, then run it with LD_PRELOAD and pipe stdout to your script. The advantage to this is that it doesn't require you to redundantly read and write from the harddrive, but the disadvantage is that it won't work well if the program writes multiple files or a binary file, and it's also a horrible hack.

Code: Select all

$ cat my-fwrite.c
#include <stdio.h>

size_t fwrite(void const* ptr, size_t size, size_t count, FILE* file) {
    printf("%s\n", (char const*)ptr);
}

$ gcc my-fwrite.-fpic -shared -Wl,-soname,my-fwrite.so -o my-fwrite.so
$ LD_PRELOAD
=./myf-write.so program-i-cant-modify | ./script.py


EDIT: I don't know if this would work on Windows or not.
ImageImage
http://internetometer.com/give/4279
No one can agree how to count how many types of people there are. You could ask two people and get 10 different answers.

User avatar
hotaru
Posts: 1045
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Reading a file as it is written

Postby hotaru » Sun Jun 09, 2013 7:39 pm UTC

it sounds like you want to do something like what "tail -f -n +1" does...

Code: Select all

factorial product enumFromTo 1
isPrime n 
factorial (1) `mod== 1

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Reading a file as it is written

Postby EvanED » Mon Jun 10, 2013 6:34 am UTC

Xenomortis wrote:I'm pretty sure the Win32 API has tools for pausing threads in another process, at least one that you spawned.
Even if I could do that, and I can't, I don't see how it would help.

PM 2Ring wrote:I haven't needed to do such things in recent years, so I can't make any recommendations, but I just found this via a quick Google:
https://pypi.python.org/pypi/watchdog/0.5.4
I'm actually already using that for other purposes. (I use it to detect the creation of the file which I want to read in such a fashion.)

I thought about putting another watch on the file I'm interested in and doing stuff in the modification handler, but that seems too complex to be worth it (especially because I think watchdog starts up another thread to do the watching). Any opinion? It's not like my current solution is awful... and I can probably turn down the polling interval a bit, say to 0.1.

phlip wrote:I know you say Program A can't be modified, but can you change what filename it's writing to? Can you get it to write to a named pipe?
This was an outstanding idea. Alas, I gave it a try and it didn't work. :-(

TheChewanater wrote:EDIT: I don't know if this would work on Windows or not.
Windows doesn't have anything nearly as well-supported as LD_PRELOAD. The closest I know of is Detours, and I'm not sure the non-professional edition would work for me. I have other reasons for not like the solution of injecting code into the target process too.

hotaru wrote:it sounds like you want to do something like what "tail -f -n +1" does...

Exactly, actually (except -c instead of -n because it's binary data). And while I of course recognized the correspondence, it didn't occur to me until you said that to check what tail does. I checked a couple different implementations, and it looks like tail at least sometimes does somewhat what I do.

Now, it does not check the file size, instead going off of what read() returns. I could probably improve mine by doing the same thing if I figure out what Python is doing to me. :-) (OTOH, it uses a much larger sleep than what I want; 0.25-1 sec.)

speising
Posts: 2353
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: Reading a file as it is written

Postby speising » Mon Jun 10, 2013 6:28 pm UTC

doesn't reading on a stream pause anyway while it isn't closed on the writer's side?


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 12 guests