This code will do mostly what you want.
- Code: Select all
#!/bin/sh
i="${1-1}"
a="$i"
b=`expr $a + 49`
echo "grabbing xkcd $a-$b"
mkdir -p "xkcd $a-$b"
while [ "$?" = '0' ]; do
if [ "$i" = '404' ]; then i=`expr $i + 1`; fi
echo " - grabbing xkcd $i"
tmp=`tempfile`
wget -qO- xkcd.com/$i | grep -e '^<img' -e '<h1>' > "$tmp"
title=`sed -n -e 's/.*<h1>//' -e 's:</h1>.*::p' "$tmp"`
img=`grep '^<img' "$tmp"`
echo "$i - `echo "$img" | sed -e 's/.* title="//' -e 's/".*//'`" \
>> "xkcd $a-$b/Alt-text.nfo"
imgurl=`echo "$img" | sed -e 's/.* src="//' -e 's/".*//'`
imgout="xkcd $a-$b/$i - $title.`echo $imgurl | sed 's/.*\.//'`"
wget -qO- $imgurl > $imgout
if [ "$i" = "$b" ]; then
a=`expr $a + 50`
b=`expr $b + 50`
echo "grabbing xkcd $a-$b"
mkdir -p "xkcd $a-$b"
fi
i=`expr $i + 1`
done
comments on using it:
- When you copy from the fora, it screws with indentation. This doesn't really matter, except `#!/bin/sh' MUST be all the way on the left.
- make sure the script is executable (`chmod +x FILENAME')
- put the file in the current directory, then run `./FILENAME'
- to continue/start working at a certain comic number, run it as `./FILENAME COMIC'
- HTML is not filtered out of the title. As far as I can tell, this only affects comic 472.
- It puts `Alt-text.nfo' in the same directory as the images. (different from your torrents)
- you still need to pack it all into a torrent
- yes, it organises them into groups of 50
- the disclaimer is not at the end of `Alt-text.nfo' is NOT added
Other comments:
When I compared our `Alt-text.nfo' files, I noticed 1)Randall puts 2 spaces after a period, you didn't. 2)you left out the text `with no context,' for comic 16. 3) in comic 39 you typed `its' rather than "it's"
Also, I noticed you type it as `XKCD', Randall has explicitly stated he likes it in all lowercase. (
http://xkcd.com/about/)(I also recall him talking about why in an interview for wikinews)
Tsk, tsk, avoiding manual labour/repetitive tasks is why we invented computers.
Anyway, this is a really cool project, and I'll seed the torrents.