«

»

Jul 28

Major algorithm just finished for The Daily Spike

For several days now, I’ve been working on a new algorithm for story detection and identification – and I’ve just finished the first step of it! Let me explain quickly what it’s trying to do:

The purpose of The Daily Spike is to monitor the entire Blogosphere (vs. select blogs on select topics – that’s easy) and identify what has suddenly piqued the interest of bloggers. In other words, I’m attempting to build an A.P. News Wire using the blogosphere as the news source. Once the system notices that say, “dog fighting” has suddenly been mentioned a lot more than usual, it has to figure out what the actual topic of the story is. By analyzing thousands of pieces of data with various statistical measures, it has to be able to come up with “Michael Vick” or “Michael Vick indicted” or something that identifies the actual subject and point of the story. This may sound easy, but to programatically determine the topic of a story – when your program has no idea what the story is about – with any kind of accuracy is very difficult. The difficulty is exponetially increased when you consider that blogs are free-form writing and nearly every post will have a different take on a story. I’ve attempted several different approaches to identifying the data including using multiple words as the identifier, but I wanted to get it down to the one KEY word that would mean something to the person reviewing it.

So, while there are still several things that need to be written in order to focus the results properly, you can read the Michael Vick story here. I’ve shared several spikes on Google Reader – you can view them here - I’ve tried to throw in stories from several different topics.

As we’re still actively developing, those spikes will likely be deleted in the next day or so, but I’ll always keep current ones in my Google shared feed. There are still several things we’re working on before we release, but hopefully it will be very soon!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>