Tuesday, October 23, 2012

Web and Data Mining

This week's readings reminded me a lot about a discussion that our class had several weeks ago. In fact, I believe we were still in the conference room for this talk! We were discussing the massive amounts of information that we were putting out in internet space, what with Facebook statuses, tweets, blogs, every Google search, and every click of the mouse.

If you remember the conversation, we were talking about how one day historians could write a history about an average person's life based entirely on Facebook statuses, or perhaps write a history about an event using millions of tweets and statuses from people across the globe. We were saying in this discussion that there would be too much information out there, similar to how there is too little information for, say, a Dark Ages Historian looking at a particular poet.

Although all of these discussion alluded to this conversation from weeks ago, none made me more convinced to refresh our memories about this dialogue than William Turkel's Digital History Hacks.

The interesting thing about this article is that it never occurred to me that one could research how people input information into a search engine. When searching for the history of a specific country, for instance, most people type in "American History" as opposed to "the history of America." But, oddly enough, if the search is of a particular subject, such as technology, the same logic does not apply. People will usually type "the history of technology" instead of "technology history." (Note: I got these examples from Turkel's article. Clink the above link to see more of his examples.)

These types of calculations about who types what in search engines seems like it will quickly attract the attention of sociologists (especially those who focus on language!), if it hasn't already!

This article relates very closely to Dan Cohen's From Babel to Knowledge. In this article, Cohen talks about search engines that look for specific items using trusted sites. A great example, and one that Cohen developed, was searching for syllabi. He created a search engine that looks at only syllabi by inputting similar characteristics in the words of the document. In his syllabus search engine, you can type in the subject you want to teach and you will get examples of how similar classes arranged their syllabus.

Before I go on, let me just say: I didn't know such a thing was possible. It astounds me that with a little programming know-how, you can develop your own, useful search engine about anything you want. Another example Cohen uses is H-bot, which only looks at trusted sites when responding to a question/keyword.

Unfortunately, I was unable to find the Syllabus Finder when I did a quick Google search for it. I would have loved to try that out, although it wouldn't be as useful to me now as it would have been last Fall semester! If anyone else was able to locate this website, let me know!

No comments:

Post a Comment