Download Comics With Transcripts

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

bridgeyman
Posts: 16
Joined: Wed Jan 23, 2008 8:53 pm UTC

Download Comics With Transcripts

Postby bridgeyman » Wed Jan 23, 2008 9:17 pm UTC

Hey,
I would like to make my own archive of xkcd comics to play with. I have seen that several people have already come up with code to download the images and even the alt text. These are great, but I would like to download the transcript from ohNoRobot.com also, for my own local searching. I just can't find a consistent way to have Oh No Robot return the transcript for any one comic. Has anyone been able to do this? Does anybody know if there would be any copyright issues with making a copy of the transcripts? I assume they would be treated the same as the comics themselves.

TTFN
Bridgeyman

User avatar
hotaru
Posts: 1010
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Download Comics With Transcripts

Postby hotaru » Wed Jan 23, 2008 9:30 pm UTC

well you can grab all of them by doing this...
http://ohnorobot.com/archive.pl?comic=56&page=&show=2

Code: Select all

uint8_t f(uint8_t n)
{
 if (!(n&1)) return 2;
  if (n==169||n==221) return 13; if (n==121||n==143) return 11;
  if (n==77||n==91) return 7; if (n==3||n==5) return 0;
  n=(n>>4)+(n&0xF); n+=n>>4; n&=0xF;
  return (n==3||n==6||n==9||n==12||n==15)?3:(n==5||n==10)?5:0; } 

bleh
Posts: 5
Joined: Wed May 14, 2008 12:23 pm UTC
Location: Georgia

Re: Download Comics With Transcripts

Postby bleh » Wed May 14, 2008 12:29 pm UTC

Where can I find the code to download all the comics?

User avatar
Xanthir
My HERO!!!
Posts: 4980
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Download Comics With Transcripts

Postby Xanthir » Wed May 14, 2008 3:15 pm UTC

You can find 'the code' by searching (on google or on this very forum), or by rolling your own.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

the mishanator
Posts: 209
Joined: Mon May 31, 2010 6:49 pm UTC

Re: Download Comics With Transcripts

Postby the mishanator » Mon May 31, 2010 6:54 pm UTC

if ur using linux (or i guess mac) just make a file called, say, xkcd and paste this code into it

Code: Select all

#!/bin/bash

cd $HOME/Desktop
mkdir xkcd_comics
cd xkcd_comics
#change the 1000 to watever number of comics u want
for ((i=0;i<=1000;i++)); do
   wget http://xkcd.com/$i/
   link=$(cat $PWD/index.html | head -n 90 | tail -n 1 | sed -e 's\<h3>Image URL (for hotlinking/embedding): \\g' -e 's\</h3>\\g')
   name=$(echo $link | sed -e 's\http://imgs.xkcd.com/comics/\\g')
   newName="(${i})-${name}"
   rm index.html
   wget -O $newName $link
done


then do

Code: Select all

chmod +x xkcd

and finally run the script with

Code: Select all

./xkcd


this will download comics 1-1000 (i know theres not that many but just to be safe) and put them in a folder called "xkcd_comics"

User avatar
PM 2Ring
Posts: 3579
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby PM 2Ring » Fri Jun 04, 2010 6:58 am UTC

There's no need to use a ceiling like 1000. You can get the current comic number using

Code: Select all

curl -s xkcd.com | egrep -o 'http://xkcd.com/([1-9][0-9]*)' | sed 's/[^0-9]//g'

User avatar
phlip
Restorer of Worlds
Posts: 7467
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Download Comics With Transcripts

Postby phlip » Tue Jun 08, 2010 6:41 am UTC

Or you can just use the JSON, like all the sane people do.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
PM 2Ring
Posts: 3579
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby PM 2Ring » Tue Jun 08, 2010 8:00 am UTC

phlip wrote:Or you can just use the JSON, like all the sane people do.

Ah. That makes life a lot simpler. :)

Code: Select all

n=333; wget -q -O - http://xkcd.com/$n/info.0.json | sed 's/, /\n/g;s/"//g;s/{//;s/}/\n/'

Of course, parsing data formatted that way using sed like that is hardly foolproof. But it's a breeze to parse in Python.

Code: Select all

#! /usr/bin/env python

''' Fetch details for a given xkcd comic number.
    If no number is given, fetch the latest data.   
'''

import sys, urllib2

Base = 'http://xkcd.com/'   
Tail = 'info.0.json'

def main():
    n = len(sys.argv) > 1 and sys.argv[1] or ''   
    url = '%s%s/%s' % (Base, n, Tail)   
    #print url
   
    f = urllib2.urlopen(url)
    a = f.readline()
    f.close()
   
    exec('d=' + a)
    for k in d:
        print '%s: %s' % (k, d[k])

if __name__ == '__main__':
    main()

User avatar
phlip
Restorer of Worlds
Posts: 7467
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Download Comics With Transcripts

Postby phlip » Tue Jun 08, 2010 8:11 am UTC

Personally, I'd avoid using exec or eval and instead use the json module... so that when davean puts "os.system('rm -rf ~ /')" in the JSON file next April fool's, everything is still good.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
PM 2Ring
Posts: 3579
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: Download Comics With Transcripts

Postby PM 2Ring » Tue Jun 08, 2010 8:47 am UTC

phlip wrote:Personally, I'd avoid using exec or eval and instead use the json module... so that when davean puts "os.system('rm -rf ~ /')" in the JSON file next April fool's, everything is still good.

Ah. Good point.

Unfortunately, I don't have that module on this ancient machine. (I really ought to get myself organized & try to upgrade the OS.)

the mishanator
Posts: 209
Joined: Mon May 31, 2010 6:49 pm UTC

Re: Download Comics With Transcripts

Postby the mishanator » Tue Jun 15, 2010 9:36 pm UTC

PM 2Ring wrote:There's no need to use a ceiling like 1000. You can get the current comic number using

Code: Select all

curl -s xkcd.com | egrep -o 'http://xkcd.com/([1-9][0-9]*)' | sed 's/[^0-9]//g'


good point, i didnt think of that...


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 7 guests