Using Google to find TLAs

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Actaeus
Posts: 606
Joined: Thu Jan 10, 2008 9:21 pm UTC
Location: ZZ9 Plural Z Alpha

Using Google to find TLAs

Postby Actaeus » Tue Jun 24, 2008 10:05 pm UTC

I was wondering if any Three Letter Acronyms would return 0 search results (I doubt it).
So I wrote some Python!

Code: Select all

import httplib
goog=httplib.HTTPConnection('www.google.com')
oa=ord('a')
oz=ord('z')+1
for x in range(oa,oz):
    for y in range(oa,oz):
        for z in range(oa,oz):
            goog.request("GET","/search?q="+chr(x)+chr(y)+chr(z))
            res=goog.getresponse()
            data=res.read()
            if "did not match any documents" in data:
                print chr(x)+chr(y)+chr(z)+" is available!"
            else: print ".",
goog.close()

So, basically, I now have a few hundred ". . . . . . . . ." in my shell.
I'm wondering if I should have used some sort of API for this. It's awfully slow (almost half a second per lookup, and I need to check 17,576. This could take a while.)
Although it should be done with "Bxx" by now...

mountaingoat
Posts: 80
Joined: Wed Aug 01, 2007 6:01 am UTC

Re: Using Google to find TLAs

Postby mountaingoat » Wed Jun 25, 2008 5:32 am UTC

http://code.google.com/apis/ajaxsearch/web.html ? Maybe? I doubt it'd be much faster.

By the way, I don't think you'll find any.

User avatar
robb
Posts: 13
Joined: Sun Aug 26, 2007 12:39 am UTC
Location: North Pole

Re: Using Google to find TLAs

Postby robb » Fri Jun 27, 2008 2:15 am UTC


poohat
Posts: 230
Joined: Mon Apr 07, 2008 6:21 am UTC

Re: Using Google to find TLAs

Postby poohat » Fri Jun 27, 2008 2:53 am UTC

at half a second per lookup you only need to wait 2 hours for it to finish, so just leave it running overnight.

If you wanted to reduce the search time then you could have done some pre-preprocessing by removing the 3 letter words which showed up in a searchable dictionaty/acronym-list. Not sure how many that would be though.

Also imo it would have been cuter if youd captured how many results were found for each 3-letter-word rather than just a binary yes/no (you could parse the 'Results 1 - 10 of about 152,000 for whatever' part)

Robin S
Posts: 3579
Joined: Wed Jun 27, 2007 7:02 pm UTC
Location: London, UK
Contact:

Re: Using Google to find TLAs

Postby Robin S » Fri Jun 27, 2008 2:59 am UTC

Well, Google indexes billions of webpages, so from random sequences alone I'd expect pretty much all character strings of about half a dozen or less to crop up somewhere.This page gives examples of word n-grams (as opposed to letter n-grams, which you're looking for), and should give you some idea of just how common search results are, even to the extent of multi-word phrases. I imagine letter n-grams are far more common even than that.
Specifically, it looks like 5.3 covers what you're currently doing. If you're only running a few thousand searches there's a fair chance it won't cause any problems, but it's still best to be aware of this sort of thing.
This is a placeholder until I think of something more creative to put here.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 10 guests