Lazy IO in Haskell

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Prelates, Moderators General

Lazy IO in Haskell

Postby phlip » Wed Oct 21, 2009 2:52 am UTC

Haskell newguy question, with more than a little stream-of-consciousness...

I have an expensive IO action I want to memoise and run lazily... which hasn't actually been written yet, I'm thinking in generalities at the moment, but specifically it's going to be a HTTP fetch (which means I still need to look into how to do HTTP stuff in Haskell too, but I assume there's a library for it somewhere)... I want it to only download the pages I actually want to look at, and memoise them so I don't download the same page twice.

So, I thought I'd do something like the list-of-lazy-values thing for pure functions:
Code: Select all
-- f :: Int -> a
fMemoised = let memo = map f [0..] in (memo !!)
So values stay un-calculated until you need them, and then they're only calculated once.

Now, I know it's possible to do something like this, because it's basically what interact does... it doesn't read the stdin all at once, you can read the first bit, and leave the rest being lazy (just try "interact id" in ghci)... so I'd like to get a [String] that nominally contains all the pages, but will only actually download them when I try to access those Strings.

The closest I've seen is sequence, but that isn't lazy... if I try something like:
Code: Select all
-- downloadPage :: Int -> IO String
-- processPages :: [String] -> something useful
main = do
   let downloaders = map downloadPage [1..10]
   pages <- sequence downloaders
   let result = processPages pages
then it'll download all the pages at once, and then process them... rather than starting the processing, and downloading the pages as necessary.

But of course, I'm not tied to this particular way of doing it, and any other way that'll let me lazily download the pages would be just as good.

Is there any way to do this without just making processPages an IO action, and putting all the laziness handling in there?
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7186
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Lazy IO in Haskell

Postby Berengal » Wed Oct 21, 2009 6:45 am UTC

There is a function called unsafeInterleaveIO, but it truly is the road to madness. Also, don't come complaining when you've suddenly got a bajillion open network connections and file descriptors or the webpages start chanting satanic rituals, because the only advice you'll get (the only advice one could give without going insane) is "don't do that then". You want to separate IO from the logic, which is an admirable goal, but once you've done that you can't have your logic drive the IO anymore.

If you want to send around possibly downloaded webpages, my initial recommendation would be to pass around something with a type like 'Map URL (Either (IO String) String)', but there are several ways of doing this which doesn't involve lazy IO. Which is best depends on context.
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.
User avatar
Berengal
Superabacus Mystic of the First Rank
 
Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway

Re: Lazy IO in Haskell

Postby phlip » Wed Oct 21, 2009 7:22 am UTC

Berengal wrote:unsafeInterleaveIO

I guess it's called "unsafe" for a reason... I think I'll pass. And, from the looks of the doc, this is how things like interact and getContents and such actually work... figures my one "aha! I know it's possible, because of this!" case would work by magic...

Berengal wrote:Which is best depends on context.

Well, here's what I'm trying to do (at first, anyway, I'm hoping that eventually I'll end up with some kind of library of forum-crawling tools to play with)... take something like the Fleeting Thoughts thread in General, and get all the different names it's had, and when it changed. Now, it could just download all 1000-odd pages, and get all the post subject lines from there, but that'd be a lot of pointless downloading for not that many topic changes.

So I thought, something like a binary search... get the first page, and the last page, and then, recursively: get the middle page, see if the topic titles are different there. If the titles on the middle page and the first page are different, then recurse on the first half, ditto for the second half. But if the topics on the pages at either end of a range are the same, there's no need to get any pages within that range. Eventually, it will have downloaded every page that contains a topic change (it may miss a spot where the topic is changed and then changed back, but that's fine by me).

So that involves only downloading some pages (exactly which ones to be determined by logic on ones it's already downloaded) and referring back to pages (or, at least, the parsed results of pages) it had previously downloaded (without having to download and parse them again).

I was hoping I'd be able to get that free, or at least cheaply, with Haskell's laziness, but it seems that's not the case? If the whole binary search algorithm has to end up in IO, I'd probably just do it in Python, since I don't think Haskell would be gaining me a whole lot...
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7186
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Lazy IO in Haskell

Postby Berengal » Wed Oct 21, 2009 4:45 pm UTC

You don't want lazy evaluation, you want lazy execution, and haskell doesn't have that (appart from the unsafe functions, which are provided mostly so people can poke at the runtime without actually having to hack the compiler), nor should any language have that.

It's rather surprising, I know, but haskell isn't a silver bullet.

Cabal, however is, at least in this scenario. cabal install hxt, be prepared to learn some weird new syntax and operators. I recently discovered the power of hxt when it comes to xml (and html) parsing and manipulation, and it completely blew my mind, as well as introducing me to arrows. (Arrows are like monads; radioactive burritos in space in a spacesuit, except this time they're travelling at the speed of light while eating the apples. Don't worry too much about them at first, but concentrate on the specific ones hxt provides, and pretend they're the only ones. Zen will come in time.) Long story short: Trees become lists become points become lists become trees, as if by magic, and your work becomes so much simpler.

There are some decent tutorials on the haskell wiki. Most likely you'll be able to figure out the basics quickly, then get stuck on the paradigm shift until finally you'll braingasm all over the place when things make sense.

It's a learning experience though, and you won't get done as quickly as you would writing it in python or in the IO monad, but everything will make so much more sense later on.
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.
User avatar
Berengal
Superabacus Mystic of the First Rank
 
Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway


Return to Coding

Who is online

Users browsing this forum: No registered users and 8 guests