Download Comics With Transcripts

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Prelates, Moderators General

Download Comics With Transcripts

Postby bridgeyman » Wed Jan 23, 2008 9:17 pm UTC

Hey,
I would like to make my own archive of xkcd comics to play with. I have seen that several people have already come up with code to download the images and even the alt text. These are great, but I would like to download the transcript from ohNoRobot.com also, for my own local searching. I just can't find a consistent way to have Oh No Robot return the transcript for any one comic. Has anyone been able to do this? Does anybody know if there would be any copyright issues with making a copy of the transcripts? I assume they would be treated the same as the comics themselves.

TTFN
Bridgeyman
bridgeyman
 
Posts: 16
Joined: Wed Jan 23, 2008 8:53 pm UTC

Re: Download Comics With Transcripts

Postby hotaru » Wed Jan 23, 2008 9:30 pm UTC

well you can grab all of them by doing this...
http://ohnorobot.com/archive.pl?comic=56&page=&show=2
Code: Select all
uint8_t f(uint8_t n)
{ if (!(
n&1)) return 2;
  if (
n==169) return 13; if (n==121||n==143) return 11;
  if (
n==77||n==91) return 7; if (n==3||n==5) return 0;
  
n=(n>>4)+(n&0xF); n+=n>>4n&=0xF;
  return (
n==3||n==6||n==9||n==12||n==15)?3:(n==5||n==10)?5:0; } 
User avatar
hotaru
 
Posts: 951
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Download Comics With Transcripts

Postby bleh » Wed May 14, 2008 12:29 pm UTC

Where can I find the code to download all the comics?
bleh
 
Posts: 5
Joined: Wed May 14, 2008 12:23 pm UTC
Location: Georgia

Re: Download Comics With Transcripts

Postby Xanthir » Wed May 14, 2008 3:15 pm UTC

You can find 'the code' by searching (on google or on this very forum), or by rolling your own.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
User avatar
Xanthir
My HERO!!!
 
Posts: 4323
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex

Re: Download Comics With Transcripts

Postby the mishanator » Mon May 31, 2010 6:54 pm UTC

if ur using linux (or i guess mac) just make a file called, say, xkcd and paste this code into it
Code: Select all
#!/bin/bash

cd $HOME/Desktop
mkdir xkcd_comics
cd xkcd_comics
#change the 1000 to watever number of comics u want
for ((i=0;i<=1000;i++)); do
   wget http://xkcd.com/$i/
   link=$(cat $PWD/index.html | head -n 90 | tail -n 1 | sed -e 's\<h3>Image URL (for hotlinking/embedding): \\g' -e 's\</h3>\\g')
   name=$(echo $link | sed -e 's\http://imgs.xkcd.com/comics/\\g')
   newName="(${i})-${name}"
   rm index.html
   wget -O $newName $link
done


then do
Code: Select all
chmod +x xkcd

and finally run the script with
Code: Select all
./xkcd


this will download comics 1-1000 (i know theres not that many but just to be safe) and put them in a folder called "xkcd_comics"
the mishanator
 
Posts: 209
Joined: Mon May 31, 2010 6:49 pm UTC

Re: Download Comics With Transcripts

Postby PM 2Ring » Fri Jun 04, 2010 6:58 am UTC

There's no need to use a ceiling like 1000. You can get the current comic number using
Code: Select all
curl -s xkcd.com | egrep -o 'http://xkcd.com/([1-9][0-9]*)' | sed 's/[^0-9]//g'
User avatar
PM 2Ring
 
Posts: 3297
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby phlip » Tue Jun 08, 2010 6:41 am UTC

Or you can just use the JSON, like all the sane people do.
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7183
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Download Comics With Transcripts

Postby PM 2Ring » Tue Jun 08, 2010 8:00 am UTC

phlip wrote:Or you can just use the JSON, like all the sane people do.

Ah. That makes life a lot simpler. :)
Code: Select all
n=333; wget -q -O - http://xkcd.com/$n/info.0.json | sed 's/, /\n/g;s/"//g;s/{//;s/}/\n/'

Of course, parsing data formatted that way using sed like that is hardly foolproof. But it's a breeze to parse in Python.
Code: Select all
#! /usr/bin/env python

''' Fetch details for a given xkcd comic number.
    If no number is given, fetch the latest data.   
'''

import sys, urllib2

Base = 'http://xkcd.com/'   
Tail = 'info.0.json'

def main():
    n = len(sys.argv) > 1 and sys.argv[1] or ''   
    url = '%s%s/%s' % (Base, n, Tail)   
    #print url
   
    f = urllib2.urlopen(url)
    a = f.readline()
    f.close()
   
    exec('d=' + a)
    for k in d:
        print '%s: %s' % (k, d[k])

if __name__ == '__main__':
    main()
User avatar
PM 2Ring
 
Posts: 3297
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby phlip » Tue Jun 08, 2010 8:11 am UTC

Personally, I'd avoid using exec or eval and instead use the json module... so that when davean puts "os.system('rm -rf ~ /')" in the JSON file next April fool's, everything is still good.
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7183
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Download Comics With Transcripts

Postby PM 2Ring » Tue Jun 08, 2010 8:47 am UTC

phlip wrote:Personally, I'd avoid using exec or eval and instead use the json module... so that when davean puts "os.system('rm -rf ~ /')" in the JSON file next April fool's, everything is still good.

Ah. Good point.

Unfortunately, I don't have that module on this ancient machine. (I really ought to get myself organized & try to upgrade the OS.)
User avatar
PM 2Ring
 
Posts: 3297
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby the mishanator » Tue Jun 15, 2010 9:36 pm UTC

PM 2Ring wrote:There's no need to use a ceiling like 1000. You can get the current comic number using
Code: Select all
curl -s xkcd.com | egrep -o 'http://xkcd.com/([1-9][0-9]*)' | sed 's/[^0-9]//g'


good point, i didnt think of that...
the mishanator
 
Posts: 209
Joined: Mon May 31, 2010 6:49 pm UTC


Return to Coding

Who is online

Users browsing this forum: No registered users and 3 guests