Is it Art? (Word Frequency Calculator)

Think your art is better or your stick figures worse? Got a link to a site you want to share? Post it here!

Moderators: Jacque, Moderators General, Prelates

Tux
Posts: 4
Joined: Wed Jul 19, 2006 3:06 pm UTC

Is it Art? (Word Frequency Calculator)

Postby Tux » Thu Jul 20, 2006 3:48 am UTC

It's far from perfect, but it TRIES to seperate the english from the code, and give you a word frequency count for a given URL.

It's kinda fun. :)

http://glyn.minimanga.com/wordfreq.php

What do you guys think? :)

User avatar
xkcd
Site Ninja
Posts: 365
Joined: Sat Apr 08, 2006 8:03 am UTC
Contact:

Postby xkcd » Thu Jul 20, 2006 4:01 am UTC

Nice -- I think it found a hyphen on my page.

What I've always really wanted to see was something that would scan a sample and find what words that sample used with a higher frequency than normal for a larger sample. Like, you could give it a book and it would tell you what words that book used more often than normal for all books.

Once you drop out proper nouns, you'd have a kind of interesting list. I know that I way overuse, for example, "interesting" and "fascinating".

I also have some code that tried to find haikus in chat logs; that was fun.

User avatar
davean
Site Ninja
Posts: 2498
Joined: Sat Apr 08, 2006 7:50 am UTC
Contact:

Postby davean » Thu Jul 20, 2006 4:16 am UTC

xkcd wrote:I also have some code that tried to find haikus in chat logs; that was fun.


You do? I totaly forgot aobut that, we need to run that on the wump ASAP man!

Nicolas Bourbaki
Posts: 18
Joined: Wed Jul 19, 2006 1:08 pm UTC
Location: greenbelt, MD

Postby Nicolas Bourbaki » Thu Jul 20, 2006 12:54 pm UTC

You might want to consider having it ignore words with non-alpha characters, such as 0-9, " ( ) . , / \ etc... Also, it looks like it's case sensitive, so if I start a sentence with "What" and have "what" again somewhere in the middle, it counts them seperately. Not sure if that's what you intended.

DaveFP
Posts: 76
Joined: Thu Jul 20, 2006 11:43 pm UTC

Postby DaveFP » Fri Jul 21, 2006 2:37 am UTC

Hmm, I don't know any php so I couldn't suggest how to improve this particular implementation. However, if I were to try something like this I would use perl with it's embedded html parser to differentiate between the code and the actual content.

Actually, one big improvement that can be very quickly implemented would be to exclude anything inside an html comment. This would immediately rid you of any css info present (which when declared in the header of an html doc, appears inside a comment).
Image

kira
I hate bananas.
Posts: 904
Joined: Fri Apr 14, 2006 4:21 am UTC
Location: school
Contact:

Postby kira » Fri Jul 21, 2006 4:59 am UTC

What I've always really wanted to see was something that would scan a sample and find what words that sample used with a higher frequency than normal for a larger sample. Like, you could give it a book and it would tell you what words that book used more often than normal for all books.


Doesn't Amazon.com have a feature similar to that? Ah, yes it does. In the searchable books, it gives a list of "Statistically Improbable Phrases", which is pretty awesome.

User avatar
xkcd
Site Ninja
Posts: 365
Joined: Sat Apr 08, 2006 8:03 am UTC
Contact:

Postby xkcd » Fri Jul 21, 2006 10:20 am UTC

Doesn't Amazon.com have a feature similar to that? Ah, yes it does. In the searchable books, it gives a list of "Statistically Improbable Phrases", which is pretty awesome.

When I go there I get the error message

amazon.com: wrote:Important Message
Amazon.com is pleased to offer customers the ability to view copyrighted material from books that are part of the Search Inside! program. To protect this copyrighted material, books are subject to viewing controls.

To view this page, you must be signed in to an Amazon.com account that has made a purchase in the past. Your account has not made an eligible prior purchase. To learn more, please see our Frequently Asked Questions.

You are free to browse sample pages from this book by clicking the links in the Sections area of the Amazon Online Reader. See more information or continue shopping.


:( Which is kind of weird, 'cause I've been using SITB for a while, and I could swear I've bought off Amazon before.

kira
I hate bananas.
Posts: 904
Joined: Fri Apr 14, 2006 4:21 am UTC
Location: school
Contact:

Postby kira » Sat Jul 22, 2006 6:59 am UTC

Apparently, dear, your reputation preceeds you.


Return to “Your art and links”

Who is online

Users browsing this forum: No registered users and 5 guests